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Included  in  this  report  are  results  of  investigations  into  the 
following  topics:  design  and  performance  evaluation  of  optimal  and 
suboptimal  estimation  and  tracking  systems  for  space-time  point-process 
observations;  optimal  signal  design  for  coded,  direct-detection  optical 
communication  systems;  informationally-decentralized  shortest  path  al- 
gorithms for  networks;  singular  estimation  and  control  problems;  compen- 
sator design  for  polynomial  matrix  descriptions  of  linear  multivariable 
systems;  a direct  proof  of  the  informational  equivalence  of  the  innova- 
tions and  observations  processes  for  linear  estimation  or  Gaussian  pro- 
cesses; and  quantitative  measures  of  controllability  and  observability 
and  their  implications  in  system  analysis  and  performance  evaluation. 
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1 . INTRODUCTION 

This  final  report  describes  the  research  conducted  under  Office  of 
Naval  Research  Contract  N00014-76-C-0667  from  March  1,  1976  to  June  30, 


This  research  has  had  as  its  principal  concern  the  investigation  of 
estimation,  decision  and  control  problems  for  systems  operating  in  an 
environment  of  uncertainty,  with  an  increasing  emphasis  on  information- 
ally-decentralized problems  that  arise  typically  in  connection  with 

3 

large  systems,  and  particularly  C -systems,  where  multiple  decision- 
makers take  actions  on  the  basis  of  their  own  individual  knowledge  and 
data. 

Our  investigations  have  been  directed  both  towards  finding  exact, 
optimum  solutions  to  various  such  estimation,  decision  and  control  prob- 
lems and,  where  these  are  not  available  or  difficult  (or  infeasible)  to 
implement,  towards  providing  a basis  for  designing  and  assessing  the 
performance  of  satisfactory  suboptimum  solutions . An  example  that  en- 
compasses both  of  these  objectives  is  found  in  our  research,  described 
in  Chapter  2,  into  estimation  and  tracking  problems  involving  space- 
time  point-process  observations.  There  we  derive  the  optimum  estimators 
and  controllers  and  show  them  to  be  nonlinear  but  finite-dimensional; 
they  are  thus  implementable,  but  evaluation  of  their  performance  re- 
quires infinite-dimensional  calculations.  We  have  therefore  derived 
easily- computed  upper  and  lower  bounds  on  the  optimum  performance;  in 
fact,  the  upper  bounds  give  the  exact  performance  of  a parametrized 
family  of  suboptimum  designs  that  are  even  more-easily  implemented  than 


the  optimuin,  and  one  of  these  is  identified  as  providing  better  perfor- 
mance than  any  other,  thus  making  it  the  best  design  within  this  class. 
When  significant  detector  dark  current  is  present,  the  exact  optimum  is 
infeasible  to  implement,  and  there  is  no  choice  but  to  examine  subopti- 
mum designs. 

This  example  illustrates  our  view  that,  because  implementable  opti- 
mum solutions  can  be  found  only  for  relatively  few  problems,  what  is 
frequently  needed  is  a shift  in  emphasis  away  from  optimal  designs  and 
towards  implementable  designs  that  achieve  satisfactory  performance. 
For  this  there  is  needed  a setting  within  which  easily- implemented  sub- 
optimum  designs  can  be  identified,  and  their  performance  evaluated 
through  a reasonably  complete  and  computationally-feasible  design  analy- 
sis. 

Even  so,  special  cases  for  which  exact,  optimum  solutions  can  be 
found  continue  to  be  important,  both  in  their  own  right  and  as  a basis 
for  suggesting  candidate  designs  for  broader  classes  of  problems,  and  in 
forming  benchmarks  with  respect  to  which  of  these  candidate  designs  can 
be  compared.  Problems  for  which  exact  solutions  have  been  derived  in 
the  course  of  this  research  include,  in  addition  to  that  described 
above,  an  optimum  signal  design  problem  involving  point-process  observa- 
tions; singular  estimation  and  control  problems;  and  shortest  path  prob- 
lems with  decentralized  information  and  topological  requirements. 

We  now  turn  to  an  outline  of  the  contents  of  the  chapters  that 
follow. 

In  Chapter  2 we  review  our  extended  research  effort  into  estima- 
tion, detection,  and  tracking  problems  involving  space-time  point-pro- 
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cess  observations.  Such  problems  arise  in  direct-detection  optical 
communication  systems,  infrared  tracking  systems,  and  star  tracking 
systems,  all  of  which  have  a requirement  for  position  sensing  and  active 
tracking  to  maintain  optical  alignment.  For  these  problems  we  have  de- 
rived optimal  estimation  and  tracking  system  designs,  and  analyzed  the 
performance  of  these  in  terms  of  easily- computed  upper  and  lower  bounds. 
Both  the  optimum  estimator  and  the  optimum  controller  are  of  interest  in 
that  they  are  nonlinear  but  finite-dimensional,  and  therefore  implement- 
able.  The  upper  bounds  also  give  the  exact  performance  of  suboptimum 
estimators  and  controllers  that  have  certain  minimality  properties  and 
are  even  more  easily  implemented  than  the  corresponding  optimum  system. 
The  last  section  of  the  chapter  discusses  our  recently-begun  examination 
of  the  case  where  there  is  significant  detector  dark  current  or  signifi- 
cant background  radiation,  thus  superimposing  on  the  problem  difficul- 
ties akin  to  those  arising  in  the  perhaps  more  familiar  problem  of 
tracking  an  object  in  clutter. 

Chapter  3 is  also  concerned  with  point-process  observations,  in 
this  case  the  design  of  optimum  signal  waveforms  for  coded,  direct- 
detection  optical  communication  systems.  Significant  new  results  with 
important  practical  consequences  have  followed  from  this  extended  pro- 
ject. It  is  of  interest  that  a modulation  scheme  we  show  to  be  optimum, 
when  an  average  energy  constraint  on  the  transmitted  signal  is  a limit- 
ing factor,  is  the  one  that  has  been  adopted  in  the  brass-boarded  one 
gigabit-per-second  optical  communication  system  currently  under  develop- 
ment. 
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Our  research  into  shortest  path  algorithms  with  decentralized  in- 
formation and  topological  requirements  is  described  in  Chapter  4.  There 
we  present  algorithms  we  have  developed  that  enable  each  node  in  a net- 
work to  calculate  its  shortest  distance  to  any  other  node  using  only 
local  knowledge  of  the  network  topology  and  only  local  information 
transfer  between  adjacent  nodes.  Shortest  path  problems  arise  in  many 
contexts,  and  algorithms  with  such  decentralized  information  require- 
ments are  of  obvious  importance  in  many  applications.  One  area  of  par- 
ticular  applications  interest  is  that  of  naval  C -systems. 

These  algorithms  are  based  on  appropriate  modification  and  reinter- 
pretation of  labelling  algorithms  for  shortest  path  problems  in  order  to 
extract  the  desired  decentralized  properties.  All  converge  in  finite 
time,  even  if  implemented  asynchronously.  The  simplest  algorithm  was 
developed  with  a static  network  in  mind,  but  it  also  handles  decreasing 
branch  lengths  and  the  introduction  of  new  nodes  or  branches.  Changes 
are  needed,  however,  to  account  for  increasing  branch  lengths  or  the 
failure  of  nodes  or  branches.  Three  such  modifications  are  presented, 
each  retaining  the  basic  localized  information  properties.  Each  has  its 
own  characteristics,  and  its  applicability  or  suitability  is  a function 
of  the  particular  network  under  consideration. 

In  Chapter  5.  we  describe  the  new  results  we  have  obt-ained  for 
singular  estimation  and  control  problems.  These  results  have  followd 
from  our  examination  of  these  problems  from  a geometric  viewpoint, 
utilizing  the  ideas  first  introduced  by  Wonham  and  Morse  and  by  Basile 
and  Marro  in  connection  with  decoupling  and  other  problems.  We  show 
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that  certain  fundamental  subspaces  introduced  by  Wonham  and  Morse  in 
that  context  also  provide  just  the  right  framework  for  an  extremely  sim- 
ple characterization  of  the  solution  to  the  singular  estimation  or  con- 
trol problem.  The  solution  is  as  simple  for  multi-input,  multi-output 
systems  as  it  is  for  single-input  and/or  single-output  ones,  in  contrast 
to  available  results  based  on  algebraic  approaches,  where  the  strongest 
and  simplest  solutions  are  for  systems  with  a single  input  (in  the  case 
of  control)  or  a single  output  (in  the  case  of  estimation).  Our  geomet- 
ric characterization  reduces  easily  and  directly  to  known  algebraic  re- 
sults in  the  single-input  or  single-output  case.  Both  continuous-time 
and  discrete-time  problems  are  included  within  the  same  unified  develop- 
ment . 

Recent  years  have  seen  a resurgent  interest  in  frequency-domain 
methods  for  the  analysis  and  design  of  linear  multivariable  systems,  in 
contrast  to  the  time-domain-based  state-space  approach  that  has  predomi- 
nated for  the  past  two  decades.  These  methods  are  based  on  polynomial 
matrix  descriptions  of  multivariable  systems  and,  in  fact,  have  both 
time-domain  and  frequency-domain  interpretations.  Chapter  6 contains  a 
description  of  our  investigation  into  design  methods  for  systems  repre- 
sented in  polynomial  matrix  form.  Our  objective  has  been  the  develop- 
ment of  methods  for  designing  compensators  for  such  systems,  and  partic- 
ularly minimum-order  compensators.  The  techniques  we  have  employed  draw 
on  the  ideas  and  objects  of  modern  algebra,  especially  the  theory  of 
modules  and  free  modules.  The  design  methods  we  have  developed  to  date 
apply  primarily  to  single-output  (multi-input)  systems,  since  the  sim- 
plest available  tools  from  modern  algebra  turn  out  to  correspond  to  this 
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case.  It  is  of  interest  to  observe  that  this  design  problem  is,  for 
polynomial  matrix  descriptions,  the  analog  of  observer-based  compensator  | 

design  via  state-space  techniques:  there,  too,  the  theory  for  single-  ; 

output  systems  preceded,  and  is  simpler  than,  that  for  multi-output 
systems.  | 

The  successful  "innovations  approach"  to  estimation  rests  on  the  | 

equivalence  of  the  information  provided  by  the  innovations  process  and  i 

that  provided  by  the  observations  process.  This  is  known  to  hold  true 
in  some  circumstances,  and  to  fail  to  hold  in  others.  We  have  con- 
structed a new,  direct  proof  of  its  known  validity  for  an  important 
class  of  linear  least-squares  estimation  problems  and  problems  involving 
Gaussian  processes.  This  proof  is  outlined  in  Chapter  7. 

Finally,  in  Chapter  8,  we  present  some  preliminary  results  of  our 
recently-begun  efforts  to  develop  quantitative  measures  of  controllabil-  j 

ity  and  observability  that  have  implications  in  design  and  performance  j 

I 

evaluation  methods  for  large  systems.  Almost  all  of  linear  system 
theory  consists  of  sharply-defined  answers  to  sharply-posed  questions.  | 

For  example,  a system  is  either  controllable  or  not,  or  decoupled  or  | 

not.  Disturbances  must  be  completely  rejected  for  disturbance  locali- 

:] 

zation  to  be  said  to  take  place.  There  is  no  body  of  theory  that  allows 

i 

for  approximate  achievement  of  these  goals.  A disturbance  or  another  ' 

input  may  affect  a certain  output  to  an  acceptably  sight  degree,  but  if 
the  effect  is  nonzero  our  present  sharp  formulations  have  us  conclude  ' 

that  the  disturbance  is  not  rejected  or  that  the  output  is  not  decoup- 
led from  the  input.  What  seems  especially  needed  are  quantitative  mea- 
sures of  controllability  and  observability  that  have  implications,  in 


terms,  say,  or  performance  bounds,  on  such  problems  of  approximate  dis- 
turbance localization  or  approximate  noninteraction,  and  on  the  perfor- 
mance of  estimators  or  controllers  designed  on  a noninteracting  basis 
but  implemented  on  an  interacting  collection  of  subsystems.  The  long- 
term goal  of  this  research  project  is  the  provision  of  such  a framework 
for  decentralized  design  and  performance  evaluation. 

A number  of  the  chapters  that  follow  are  concerned  with  work  that 
has  been  already  published  or  is  available  in  a report  that  has  been 
submitted  for  publication.  In  those  cases,  the  appropriate  papers  or 
reports  are  included  as  appendices,  and  the  presentation  i.-  limited  to 
an  outline  of  the  results  that  are  established  there  in  detail.  Also, 
in  these  cases,  references  are  made  where  possible  to  the  list  of  ref- 
erences in  the  relevant  appended  paper  or  report. 

Chapter  4 was  prepared  with  the  assistance  of  doctoral  student 
Jeffrey  M.  Abram,  and  Chapter  6 with  the  assistance  of  doctoral  student 
Olive  Y.  Liu. 
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2.  ESTIMATION  AND  CONTROL  PROBLEMS  WITH  SPACE-TIME  POINT-PROCESS 

OBSERVATIONS 
2.1  Introduction 

A major  component  of  our  research  effort  has  been  concerned  with 
observation  models  other  than  the  familiar  "signal  in  additive  white 
Gaussian  noise"  structure.  Particular  attention  has  been  given  to  ob- 
servations that  take  the  form  of  a doubly-stochastic  space-time  counting 
process  whose  intensity  is  signal-dependent.  Estimation  and  control 
problems  involving  such  processes  arise  in  a number  of  contexts,  includ- 
ing quantum- limited  optical  communication  and  nuclear  medicine.  The 
estimation  and  tracking  problems  associated  with  optical  communication 
systems  have  been  discussed  in  detail  as  motivation  in  the  papers  that 
have  resulted  from  this  study. 

New  results  of  major  significance  have  been  obtained  for  both  esti- 
mation and  control  problems  involving  space-time  counting  process  obser- 
vations. A sequence  of  new  results  culminated  in  the  journal  article: 

"Estimation  and  Control  Performance  for  Space-Time  Point- 
Process  Observations,"  Ian  B.  Rhodes  and  Donald  L.  Snyder, 

IEEE  Transactions  on  Automatic  Control,  Vol.  AC-22,  No. 3, 

June  1977,  pp.  338-346. 

which  is  included  here  as  Appendix  3.  This  paper  includes  as  special 
cases  all  earlier  results  for  this  class  of  problems,  including  our  own 
earlier  research  under  this  contract  which  was  reported  in  the  journal 
article: 

"A  Separation  Theorem  for  Stochastic  Control  Problems  with 
Point  Process  Observations,"  D.  L.  Snyder,  I.  B.  Rhodes, 
and  E.  V.  Hoversten,  Automatica,  Vol.  13,  No.  1,  January 
1977,  pp.  85-87. 
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and  in  the  invited  conference  paper: 

"Estimation  and  Control  Performance  for  Space-Time  Point- 
Processes,"  I.  B.  Rhodes  and  D.  L.  Snyder,  Proceedings  of 
the  Fourteenth  Allerton  Conference  on  Circuit  and  System 
Theory,  University  of  Illinois,  September  1976,  pp.  38-51. 

These  two  papers  are  included  here  as  Appendices  1 and  2. 

The  next  subsection  outlines  in  loose  terms  the  results  established 
in  detail  in  Appendix  3.  This  is  followed  by  a discussion  of  our  recent 
efforts  to  extend  these  results  to  the  situation  where  there  is  signifi- 
cant detector  dark  current  or  significant  background  radiation. 

2.2  Summary  of  Appendix  3 


In  outline,  the  paper  included  as  Appendix  3 considers  a stochastic 


system 


dx^  = F(t)  x^dt  + G(t)  u^  dt  + V(t)dv^ 


dz  = C(t)  X dt  + dw 

t t 


where  u^  is  a control  variable,  v and  w are  Wiener  processes,  and  the 
usual  assumptions  (detailed  in  the  paper)  are  made.  In  addition  to  the 
observation  process  z,  we  assume  additional  observations  of  a space-time 
point  process  N(t,r)  in  which  each  point  occurrence  has  both  a temporal 
coordinate  t and  a spatial  location  r.  In  an  optical  communication  set- 
ting, this  point  process  might  be  thought  of  as  a model  for  photoelec- 
tron conversions  on  a detector  surface,  a particular  point  occurrence 
corresponding  to  a conversion  taking  place  at  time  t and  at  location  r 
on  the  detector  surface.  Associated  with  N(t,r)  is  a counting  process 


which  simply  counts  point  occurrences  regardless  of  their  spatial  lo- 
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cations;  is  assumed  to  be  a doubly-stochastic  process  with  stochastic 
intensity  Given  that  a point  occurrence  has  taken  place,  its  spa- 

tial location  r is  taken  to  be  a Gaussian  random  vector  with  mean  H(t)x^ 
and  covariance  R.  In  terms  of  the  photoelectron  conversion  model  men- 
tioned above,  the  dependence  of  r on  the  system  state  reflects  the 
(random)  movement  of  the  center  of  the  incident  beam  due  to  vibration, 
beam  steering  due  to  atmospheric  turbulence,  the  motion  of  the  tracking 
system,  etc.  The  control  u^  represents  the  input  to  the  tracking  sys- 
tem, which  is  included  as  part  of  the  total  state  x^.  The  randomness  of 
the  temporal  intensity  includes  the  transmission  of  information  by 
modulating  p,  as  well  as  randomness  due  to  such  effects  as  fading  during 
propagation. 

For  this  model,  we  have  examined 

a)  the  estimation  problem  of  finding  the  conditional  density  of  the 
processes  (x^,p^)  at  time  t given  observations  of  both  z and  the 
space-time  point-process  up  to  time  t,  and  especially  to  find  the 
associated  conditional  means  and  covariances. 

b)  the  control  problem  of  finding  the  control  u^  that  depends  at  most 
on  the  past  of  the  space-time  point-process  and  z and  minimizes 


Jtu]  = E 


T 

J fu^P(t)u^  + x^Q(t)x^]  dt  + x^Sx^ 
0 


Precise  statements  of  these  problems,  their  solutions  and  some 
attendant  technical  assumptions  are  given  in  the  paper.  In  simple 
terms,  the  resuls  we  establish  there  are: 
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(i)  Under  assumptions  that  are  reasonable  from  a practical  view- 
point, the  joint  problem  of  estimating  both  the  temporal  in- 
tensity |J^  and  the  state  reduces  to  two  separate  problems, 
one  of  estimating  |J^  from  just  the  temporal  component  of 
the  space-time  point  process,  and  the  other  of  estimating  x^ 
using  all  available  observations.  In  terms  of  the  optical 
communication  problem,  this  is  of  great  practical  importance 
since  it  establishes  that  demodulation  or  detection  can  be 
carried  out  independently  of  tracking  (provided,  of  course, 
optical  boresight  is  maintained). 

The  demodulation  or  detection  problem  of  estimating  from  the 
temporal  component  of  the  space-time  point  process  is  a 
standard  one  that  has  been  solved  under  a variety  of  assump- 
tions on  |J  in  the  book  by  Snyder  [Ref.  6 in  Appendix  3]. 

We  show  in  Appendix  3 that  the  conditional  density  of  x^  given 
all  observations  up  to  time  t is  Gaussian.  Furthermore,  the 
conditional  mean  and  covariance  satisfy  a pair  of  finite- 
dimensional , nonlinear  stochastic  differential  equations  (see 
eqs  (6)  - (8)  in  Appendix  3).  It  should  be  emphasized  that 
although  the  optimum  estimator  is  nonlinear  it  is  finite- 
dimensional and  therefore  implementable  in  practice. 

(ii)  The  solution  to  the  control  problem  satisfies  a separation 
theorem  analogous  to  the  standard  linear-quadratic-Gaussian 
separation  theorem  of  linear  system  theory,  i.e.  the  optimum 
tracking  controller  separates  into  two  separate  and  indepen- 
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dent  components:  an  estimator  and  a control  law.  The  esti- 
mator is  the  finite-dimensional,  nonlinear  one  described 
above,  while  the  control  law  is  the  certainty-equivalent 
linear  one.  Being  finite-dimensional,  the  controller  can  be 
easily  implemented  in  practice.  This  separation  theorem  is 
important  both  theoretically  and  practically.  From  a theo- 
retical standpoint,  it  seems  to  be  the  only  case  beside  the 
standard  LQG  result  where  separation  of  a dual-control  problem 
into  two  independent  problems  has  been  established.  Not  only 
is  this  an  important  exact  result  in  its  own  right,  but  it  has 
the  as-yet  uninvestigated  potential  of  forming  a benchmark  for 
designing  and  assessing  the  performance  of  suboptimal  control- 
lers in  wider  situations,  much  as  we  have  previously  used  the 
standard  LQG  result  to  obtain  bounds  for  incrementally  conic 
nonlinear  systems.  From  a practical  viewpoint,  the  separation 
theorem  for  space-time  point-process  observations  provides 
the  simple,  optimum  design  for  an  important  class  of  tracking 
and  other  problems. 

(iii)  Although  the  optimum  estimator  is  finite-dimensional,  its 
error  covariance  depends  on  the  point  process  occurrence  times 
and  is  thus  a random  process  that  is  not  precalculabe  (in  con- 
trast to  the  deterministic,  precalculable  error  covariance  of 
the  standard  Kalman  filter).  A natural  measure  of  estimator 
performance,  which  also  turns  out  to  determine  controller  per- 
formance, then  becomes  the  expected  error  covariance;  however. 


\ 
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although  this  is  deterministic,  its  calculation  is  infinite- 
dimensional. We  have,  therefore,  derived  easily-computed  up- 
per and  lower  bounds  on  the  expected  error  covariance  and  cor- 
responding bounds  on  the  optimum  controller  performance.  The 
upper  bounds  are  derived  by  evaluating  exactly  the  performance 
of  a parametrized  family  of  suboptimum  designs;  one  of  these 
is  identified  as  having  smaller  performance  than  any  other, 
thus  providing  a minimal  upper  bound  within  this  family.  The 
bound-minimal  estimator  and  controller  are  thus  natural  can- 
didates for  designs  that  are  even  more  simply  implemented  than 
the  optimum,  in  that  they  require  less  on-line  computation  be- 
cause the  gain  coefficients  are  deterministic  and  precalcu- 
lable  rather  than  stochastic  and  dependent  upon  the  particular 
realization  of  the  counting  process,  N^,  as  they  are  in  the 
optimum  estimator. 

2.3  Extensions  When  Detector  Dark  Current  is  Present 

Both  estimation  and  tracking  problems  are  greatly  complicated  when 
there  is  significant  detector  dark  current  (or  background  radiation). 
This  is  because  of  the  uncertainty  that  then  exists  as  to  whether  an 
observed  point  in  space-time  is  due  to  the  signal  process  or  to  the  dark 
current.  In  this  respect,  the  principal  difficulties  that  arise  are 
conceptually  similar  to  those  in  the  more  familiar  problem  of  tracking 
an  object  in  clutter,  where  again  uncertainty  exists  as  to  whether  an 
observation  corresponds  to  the  object  being  tracked  or  to  the  clutter. 
A recent  summary  of  the  problem  of  tracking  in  clutter  can  be  found  in 

HI. 
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In  either  case,  a solution  to  the  estimation  problem  can  be  ob- 
tained in  principle  by  constructing  a bank  of  estimators  that  expands 
geometrically  with  successive  observations,  and  appropriately  weighting 
the  outputs  of  these  to  obtain  the  conditional  mean  In  our  case,  this 

means  that  after  N point  occurrences  in  space-time,  there  will  be  re- 

N 

quired  a bank  of  2 estimators  of  the  type  given  in  Appendix  3,  each 

N 

estimator  corresponding  to  one  of  the  2 possible  hypotheses  as  to  which 
points  in  the  observed  sequence  are  due  to  the  signal  and  which  to  the 
dark  current.  For  each  such  hypothesis,  the  corresponding  estimator 
satisfies  the  equations  (6)  - (8)  in  Appendix  3,  but  including  only 
those  observation  points  hypothesized  as  being  due  to  the  signal,  and 
neglecting  those  hypothesized  as  being  due  to  the  dark  current.  The 
state  x^^  of  the  i-th  estimator,  corresponding  to  hypothesis  as  to 
which  observation  points  are  due  to  the  signal  and  which  to  dark  cur- 
rent, is  the  conditional  mean  of  the  state  given  both  all  observed 
data  to  time  t,  and  that  hypothesis  holds.  The  conditional  mean  x^ 
of  the  state  is  then  found  as  the  linear  combination 


A V A 

*t  = ZPit  V 


where  p is  the  conditional  probability  that  hypothesis  H.  is  true 

1 w 1 

given  all  observation  data  up  to  time  t.  Equations  for  the  p^^  can  be 
developed  under  various  sets  of  assumptions  on  the  dark  current  process. 
One  simple  possibility  is  to  assume  that  the  time  component  of  the  dark 
current  process  is  Poisson  with  rate  V and  independent  of  the  signal 
process,  and  that,  given  a dark  current  point  occurrence  has  taken 
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place,  its  spatial  location  is  uniformly  distributed  over  the  detector 
area  A (assumed  large  compared  with  the  covariance  of  the  signal-induced 
spatial  distribution)  and  is  independent  of  the  spatial  locations  of 
prior  and  succeeding  points.  Even  then,  the  equations  for  the  p^^  be- 
come unwieldy,  and  they  are  not  given  here  because,  in  any  event,  the 
requirement  of  constructing  such  a rapidly-expanding  bank  of  filters 
makes  this  solution  impractical  in  almost  any  conceivable  application, 
i.e.  unless  very  few  point  occurrences  are  expected  to  take  place. 

One  is,  therefore,  led  to  seek  more-readily-implemented  suboptimal 
estimators.  We  have  taken  the  approach  of  investigating  estimators 
whose  dimension  is  that  of  the  system  state  x^,  thus  bypassing  at  the 
outset  the  expanding-state  requirement  of  the  optimum  estimator.  We 
adopt  the  same  notation  and  the  same  models  for  the  signal  and  the 
signal-induced  space-time  point  process  as  in  our  paper  [Appendix  3] , 
and  assume  that  the  dark  current  satisfies  the  assumptions  given  towards 
the  end  of  the  preceding  paragraph.  Let  the  first  observed  point  be  at 
time  t and  at  location  r.  Over  [0,t)  the  optimum  estimator  is  n-dimen- 
sional  and  satisfies  eq.  (6)  in  Appendix  3 with  the  last  term  identi- 
cally zero  since  no  points  have  yet  occurred;  indeed  at  time  t-  the 
conditional  density  of  the  state  given  the  observations  is 
i.e.  Gaussian  with  mean  and  covariance  given  via  eqs.  (6)  and 

(7)  in  Appendix  3,  with,  in  both  cases,  the  last  term  identically  zero. 
Under  the  hypothesis  that  the  observed  point  is  due  to  the  signal,  the 
conditional  density  of  x at  t+  is,  from  eqs.  (6)  - (9)  in  Appendix  3, 
G(Xt^,Zt^)  with 
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Xt+  = + R]‘^(r-H^^_) 

and 

Zt+  = ■ ^t-"' 

I 

i 

Under  the  assumption  that  the  observed  point  is  due  to  dark  current,  the  j 

• { 

conditional  density  of  x at  t+  is  the  same  as  at  t-,  viz.  G(x  I ). 

t- , t“ 

It  then  follows  that  the  conditional  density  of  x at  t+  given  data  to 
t+,  including  the  observed  point,  is  the  convex  sum  of  Gaussian  distri- 
butions 

= f(Xj^|data  to  t+)  = -t 

where  p^  is  the  conditional  probability  that  the  observed  point  at  time 
t is  due  to  the  signal.  Various  equivalent  expressions  can  be  given  for 
p^;  one  is 

= [1  + (n/s)*exp|  p^]”^ 

where 

= (r  - + HIj._H']‘^[r  - Hx,._], 


and  we  assume  for  simplicity  that  the  temporal  intensity  (j^  of  the  space 
time  point  process  is  constant.  It  then  follows  from  straightforward 
calculations  that  the  conditional  mean  and  covariance  of  x at  time  t+ 
given  data  to  t+  are,  respectively. 


V = Pt  V ^ ■ Pt^  *t- 

- Pt  V «’  ^ 

+ p^a  - Pt)c\+  - v'^*t+  ■ v^' 


Although  these  expressions  give  the  exact  conditional  mean  and  co- 
variance  immediately  following  the  first  point  observation,  the  condi- 
tional density  is  not  Gaussian  but,  rather,  the  convex  combination  of 
Gaussian  distributions  given  above.  Thus,  in  contrast  to  the  situation 
that  obtains  in  the  absence  of  dark  current,  conditional  Gaussian-ness 
is  not  maintained  across  the  first  occurrence  point,  and  this  procedure 
cannot  be  repeated  through  succeeding  observation  points.  Indeed,  after 
the  N-th  observation  point  the  conditional  density  is  a convex  combina- 
tion  of  2 Gaussian  densities,  and  it  is  the  generation  of  these  densi- 
ties that  reflects  in  the  2^  estimators  required  in  the  exact  solution. 

On  the  other  hand,  one  natural  approach  to  maintaining  an  n-dimen- 
sional  filter  is  to  approximate  the  conditional  density  f^^  following 
the  first  observation  point  by  a Gaussian  density  with  the  same  mean 
and  covariance  2^^  given  above.  This  Gaussian  approximation  then  re- 
mains Gaussian  as  it  is  propagated  to  just  before  the  next  occurrence 
point  using  eqs.  (6)  -(9)  in  Appendix  3.  The  above  procedure  is  re- 
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peated  to  incorporate  the  new  space-time  data  point,  and  the  process  is 
repeated . 

Evaluation  of  the  performance  of  this  suboptimum  estimator  is  dif- 
ficult because  the  resulting  equation  for  the  mean-square-error,  1,  de- 
pends not  only  on  the  occurrence  times  (as  it  does  in  the  dark-current- 
free  problem  in  Appendix  3)  but  also  on  the  spatial  locations  of  the  ob- 
servation points  through  both  p^  and  This  nonlinear  dependence  on 
the  spatial  locations  as  well  as  on  the  occurrence  times  greatly  compli- 
cates an  analysis  in  terms  of  bounds  comparable  to  that  performed  in 
Appendix  3 for  the  dark-current-free  case. 

We  have  begun  to  investigate  parameterized  families  of  suboptimum 
estimators  in  which  p^  is  restricted  to  being  dependent  only  upon  r in  a 
simple  way,  in  combination  with  the  suboptimal  estimator  eq.  (16)  in 
Appendix  3.  This  means  that  the  family  of  suboptimum  estimators  (16)  in 
Appendix  3 is  modified  to  become 


= Fx^dt  + Gu  dt  + L(t)  [dz  - Cxfdt] 
+ p(r)  M(t)  [r  - Hx^^]  N(dt  x dr) 


One  possibility  is  to  restrict  p(r)  to  being,  say. 


P(r) 


1 [r  - - HxJ]  < a 

' 


0 


otherwise 
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where  a is  a parameter  to  be  chosen  and  i is  the  error  covariance  as- 
sociated with  In  simple  terms,  this  means  that  if  r is  "sufficient- 
ly close"  to  its  expected  location  Hx^,  "sufficiently  close"  being  de- 
termined by  the  parameter  a,  then  the  point  occurrence  will  be  taken  as 
being  due  to  the  signal;  otherwise,  it  will  be  neglected  as  being  due  to 
the  dark  current.  Our  objective  is  to  find  choices  of  the  gains  L(t) 
and  M(t)  and  of  the  parameter  a that  are  in  some  sense  optimum,  such  as 
leading  to  a minimum  error  covariance  1^^ . Our  investigation  of  this 
problem  is  continuing.  We  remark  that  a much  simpler  version  of  this 
problem  has  been  examined  by  simulation  in  [2] , where  a simpler  criteri- 
on for  accepting  or  rejecting  points  as  being  due  to  the  signal  is  em- 
ployed. 


i 
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3.  OPTIMUM  SIGNAL  DESIGN  FOR  CODED,  DIRECT-DETECTION,  OPTICAL 
COMMUNICATION  SYSTEMS 

Important  new  results  have  followed  from  our  new  approach  to  the 

coordinated  design  of  the  encoder,  optical  modulator  and  demodulator  for 

a digital  communication  system  employing  an  optical  carrier  and  direct 

detection.  These  results  are  contained  in  the  revised  report: 

"Some  Implications  of  the  Cutoff-Rate  Criterion  for  Coded, 
Direct-Detection,  Optical  Communication  Systems,"  Donald 
L.  Snyder  and  Ian  B.  Rhodes,  Biomedical  Computer  Labora- 
tory Monograph  363,  Washington  University,  St.  Louis,  MO, 

March  1979, 

which  is  included  as  Appendix  4 and  has  been  submitted  for  publication 
in  the  IEEE  Transactions  on  Information  Theory.  Individual  results  from 
this  comprehensive  report  (Appendix  4]  have  been  presented  at  two  con- 
ferences and  one  workshop,  and  another  conference  presentation  will  take 
place  later  this  year: 

"Signal  Optimization  for  Random  Point  Processes,"  D.  L. 

Snyder  and  Ian  B.  Rhodes,  AFOSR  Workshop  in  Communication 
Theory  and  Applications,  Provincetown,  Massachusetts, 
September  17-20,  1978. 

"Quantization  Loss  in  Optical  Communication  Systems," 

Donald  L.  Snyder  and  Ian  B.  Rhodes,  Sixteenth  Allerton 
Conference  on  Communication,  Control,  and  Computing,  Uni- 
versity of  Illinois,  October  4-6,  1978. 

"Some  Implications  of  the  Cutoff  Rate  Criterion  for  Coded, 
DirectDetection,  Optical  Communication  Systems,"  Donald  L. 

Snyder  and  Ian  B.  Rhodes,  1979  IEEE  International  Informa- 
tion Theory  Symposium,  Grignano,  Italy,  June  25-29,  1979. 
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"Quaternary  Pulse  Modulation  is  Optimal  for  Optical  Commu- 
nication at  One  Gigabit  Per  Second,"  Donald  L.  Snyder  and 
Ian  B.  Rhodes,  National  Telecommunications  Cc  Terence, 
Washington,  DC,  November  27-29,  1979. 

Because  Appendix  4 provides  a complete  account  of  the  cjnip rehens ive 
collection  of  results  obtained  in  the  course  of  this  extended  research 
effort,  we  limit  ourselves  here  to  a very  brief  outline  of  the  principal 
conclusions . 

The  basis  of  our  new  approach  has  been  the  reformulation  of  this 
signal  design  problem  to  use  the  cutoff  rate  as  a performance  measure 
instead  of  the  usually-employed  probability  of  error.  The  use  of  cut- 
off rate  as  a design  criterion  has  been  eloquently  and  persuasively 
argued  by  Massey  in  his  apparently  little-noticed  1974  conference  paper 
[Ref.  8 in  Appendix  4].  In  this  paper  he  also  examined  the  additive 
white  Gaussian  noise  channel  and  was  able  to  prove  for  the  first  time  a 
long-standing  conjecture  on  the  optimality  of  a simplex  signal  set. 

We  have  derived  the  cutoff  rate  for  a digital  communication  system 
employing  an  optical  carrier  and  direct  detection,  and  we  have  used  this 
as  the  performance  measure  in  studying  the  coordinated  design  of  the  op- 
tical modulator  and  demodulator.  The  choice  of  modulation  that  maxi- 


mizes the  cutoff  rate  has  been  derived  for  various  relationships  between 
peak  amplitude  and  average  energy  constraints  on  the  transmitted  optical 
signal,  and  found  to  be: 

(i)  When  the  average  energy  constraint  is  predominant,  pulse  posi- 


tion modulation  is  found  to  be  optimum. 


(ii)  When  the  peak  amplitude  constraint  predominates,  Hadamard 
matrices  can  be  used  to  define  an  optimum  choice  of  modula- 
tion. 

(iii)  When  neither  constraint  predominates,  appropriate  time  sharing 
of  the  solutions  given  in  (i)  and  (ii)  above  is  optimum. 

We  have  also  addressed  within  this  framework  problems  of  efficient  ener- 
gy utilization,  the  choice  of  input  and  output  alphabet  dimensions,  and 
the  effect  of  random  detector  gain. 

Corresponding  results  are  also  shown  to  hold  when  polarization 

modulation  is  employed  in  the  optical  modulator  as  well  as  temporal 

modulation.  Specifically,  for  an  input  alphabet  of  dimension  4,  the 

optimal  modulation  when  average  signal  energy  constraints  predominate 
employs  binary  pulse-position  and  binary  polarization  modulation;  it  is 
of  interest  to  note  that  such  a modulation  scheme  has  been  adopted  in 
the  one  gigabit  per  second  satellite  optical  communication  system  re- 
ported by  Ross  et  al.  in  [Ref.  15  of  Appendix  4]. 
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A.  INFORMATIONALLY-DECENTRALIZED  NETWORK  PROBLEMS 
A.l  Introduction 

A major  effort  has  been  concentrated  on  developing  shortest  path 
algorithms  that  enable  each  node  in  a network  to  calculate  its  shortest 
distance  to  any  other  node  using  only  local  knowledge  of  the  network 
topology  and  only  local  information  transfer  between  adjacent  nodes. 

The  requirement  that  information  transfer  and  topological  information 
be  localized  contrasts  sharply  with  the  global  information  that  is  re- 
quired by  almost  all  of  the  many  existing  shortest  path  algorithms; 
the  implementation  of  these  algorithms  can  be  thought  of  as  requiring 
each  node  to  transmit  distance  and  topology  information  to  a central 
controller,  who  is  then  responsible  for  solving  the  problem  and  sending 
the  appropriate  optimal  routing  information  to  each  of  the  nodes.  In 
a large  network  this  could  involve  a significant  amount  of  communica- 
tion. Additionally,  for  some  networks  establishment  of  a central  con- 
troller may  be  expensive,  infeasible,  or  undesirable  from  a security  or 
reliability  viewpoint. 

Shortest  path  problems  arise  in  many  contexts,  and  algorithms  with 

decentralized  topological  and  information  transfer  requirements  are  of 

obvious  importance  in  many  applications.  In  addition  to  the  traditional 

applications  areas,  an  area  of  particular  applications  interest  is  that 
3 

of  naval  C -systems,  and  an  algorithm  we  have  developed  was  presented  . -H 

at  the  First  MIT/ESL-ONR  Workshop  on  Distributed  Communication  and  Deci-  ^ 

3 

sion  Problems  Motivated  by  Naval  C -Systems  held  at  MIT  in  August,  1978. 

A more  detailed  account  of  this  algorithm  appears  in  the  conference 
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paper: 

"A  Decentralized  Shortest  Path  Algorithm,"  Jeffrey  M. 

Abram  and  Ian  6.  Rhodes,  Proceedings  of  the  Sixteenth 
Allerton  Conference  on  Communications,  Control  and  Com- 
puting, University  of  Illinois,  October  4-6,  1978,  pp. 

271-277, 

which  is  included  here  as  Appendix  5. 

This  algorithm  was  initially  developed  for  a static  network  in 
which  branch  lengths  and  topology  remain  constant,  though  it  can  accom- 
modate limited  changes.  We  have  subsequently  made  a number  of  modifica- 
tions to  the  algorithm  to  enable  it  to  operate  in  a dynamic  network  in 
which  branch  lengths  can  increase  or  decrease,  and  nodes  or  branches  can 
be  added  to  or  removed  from  the  network.  The  ability  of  •'n  algorithm 

to  handle  such  topological  changes  is  essential  in  most  practical  appli- 

3 

cations,  including  especially  those  arising  in  connection  with  C -sys- 
tems . 

A brief  outline  of  the  algorithm  described  in  Appendix  5 is  given 
in  the  next  section.  This  is  followed  by  a description  of  several  modi- 
fications of  this  algorithm  to  accommodate  various  types  of  changes  in 
a dynamic  network. 
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Very  little  topological  information  is  needed.  Each  node  needs  to  know 
only  which  of  its  neighbors  are  attached  to  incoming  branches,  which  are 
attached  to  outgoing  ones,  and  the  lengths  of  the  branches  to  the  out- 
going neighbors.  For  each  ultimate  destination,  a node  calculates  and 
stores  an  assessment  of  the  shortest  path  via  each  of  its  outgoing 
links;  the  smallest  of  these  is  taken  to  be  its  assessment  of  the  short- 
est path  to  that  destination  and  is  subsequently  referred  to  as  the 
current  shortest  distance.  Also  stored  is  the  identity  of  the  outgoing 
neighbor  which  achieves  this  minimal  distance.  Initially,  the  current 
shortest  distance  is  taken  to  be:  for  ultimate  destinations  that  are 
neighboring  nodes,  the  corresponding  outgoing  branch  length;  for  all 
other  destinations,  infinity. 

Whenever  a node's  current  shortest  distance  to  a destination 
changes,  either  through  reinitialization  or  new  information  received 
from  a neighbor,  this  new  distance  is  transmitted  to  all  incoming  neigh- 
bors. At  the  conclusion  of  the  algorithm,  each  node  will  know  the 
shortest  distance  to  each  other  node  (or  that  no  path  exists),  the  next 
node  in  the  path  that  achieves  this  distance,  and  the  shortest  distance 
via  each  alternative  outgoing  node.  The  algorithm  is  guaranteed  to 
converge,  even  if  it  is  implemented  in  an  asynchronous  manner. 

4.3  Dynamic  Network  Algorithms 

While  the  above  algorithm  handles  static  networks,  many  problems 
arise  for  a dynamic  network  model.  The  phenomena  that  we  have  investi- 
gated include  branch  lengths  decreasing  and  increasing,  branches  being 
introduced  into  the  network,  and  branches  failing  or  being  removed  from 
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the  network.  It  should  be  noted  that,  in  principle,  a node  coming  up  or 
going  down  can  be  treated  as  the  incident  set  of  branches  simultaneously 
doing  the  same  thing. 

The  easiest  case  to  handle  is  that  of  decreasing  branch  lengths. 
A crucial  element  of  the  convergence  proof  for  the  static  algorithm  is 
that  every  distance  assessment  is  no  smaller  than  the  corresponding  true 
distance,  so  that  monotone  convergence  applies.  For  decreasing  branch 
lengths  this  condition  is  still  met  and  the  convergence  proof  remains 
valid. 

On  the  other  hand,  when  a branch  length  increases,  this  condition 
may  be  violated  and  the  original  convergence  proof  is  no  longer  appli- 
cable. The  algorithm  may  or  may  not  still  converge  for  a particular 
graph;  even  if  it  does,  convergence  may  be  slow,  as  illustrated  by  the 
following  example: 

Example  1. 


Let  node  4 be  the  destination  and  assume  that  the 
static  algorithm  has  converged  to  the  correct  solution. 
Thus  node  3,  for  example,  has  a shortest  distance  of  1, 
with  alternative  paths  via  node  1 or  2 of  length  3.  Now 
suppose  that  branch  length  s^^  increases  to  20,  giving 


Notice  that  all  shortest  paths  to  node  4 are  affected  by 
the  change.  When  the  change  occurs,  node  3 can  immedi- 
ately adopt  20  as  his  new  direct  distance  tc  node  4.  How- 
ever, he  now  believes  that  he  can  achieve  a distance  of  3 
via  node  1 or  2.  Obviously,  this  distance  of  3 can  no 
longer  be  physically  achieved,  but  node  3 is  unaware  that 
his  alternative  paths  have  been  affected  by  the  branch 
increase.  He  now  tells  nodes  1 and  2 that  his  current 
shortest  distance  has  increased  to  3.  Node  1 now  com- 
pares his  new  distance  of  4 via  node  3 to  his  distance 
via  node  2,  and  decides  that  his  new  current  shortest 
distance  is  3.  Node  2 takes  similar  action.  They  then 
transmit  this  information  to  node  3,  who  now  increases 
to  4 his  assessment  of  the  distance  to  node  4.  This 
process  of  gradually  increasing  the  distance  assessments 
will  continue  until  the  true  distances  are  reached;  so 
even  with  this  small,  simple  example,  convergence  will 
take  quite  a long  time. 


Some  modification  to  the  algorithm  is  necessary  in  order  to  guaran- 
tee convergence  when  branch  lengths  increase.  When  such  an  increase 
does  occur,  the  nodes  must  somehow  be  enabled  to  determine  which  dis- 
tance assessments  can  be  trusted,  and  which  cannot. 
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With  this  goal  in  mind,  we  have  developed  several  modified  versions 
of  the  static  algorithm,  three  of  which  are  described  below.  Each  ver- 
sion has  its  advantages  and  disadvantages,  and  the  applicability  and 
suitability  of  each  is  a function  of  the  characteristics  of  the  particu- 
lar network  under  consideration.  The  first  version  is  the  simplest  and 


most  robust  with  respect  to  other  topological  changes,  such  as  introduc- 
tion of  new  nodes  or  branches,  but  requires  total  suspension  of  distance 
conuDunication  for  a sufficiently  long  period  that  all  nodes  can  be  guar- 
anteed to  have  reinitialized.  The  second  method  can  effectively  handle 
branch  increases  and  failures  but  can  encounter  difficulties  when  a new 
link  (or  node)  is  introduced  into  the  network.  Thus,  if  links  are  added 


rarely  and  under  controlled  circumstances,  this  version  could  be  ap- 


propriate. The  third  modified  algorithm  was  developed  in  an  attempt  to 
improve  the  second,  but  as  we  have  often  found  to  be  true,  a modification 
which  solves  one  problem  can  introduce  a new  one.  In  this  third  ver- 
sion the  problem  of  introducing  new  links  has  been  solved  but  certain 
link  or  node  failures  cannot  be  accommodated  and  require  special  hand- 
ling. 

Each  of  these  modified  algorithms  is  based  primarily  on  some  form 
of  reinitialization  of  the  basic  static  algorithm;  they  differ  mainly 
in  the  mechanics  of  the  reinitialization.  It  is  not  sufficient  to  mere- 
ly disseminate  a reinitialization  command  throughout  the  network  when  a 
branch  increase  occurs.  Once  a node  has  reinitialized  its  distance  as- 
sessments, it  needs  some  guarantee  that  subsequently  received  informa- 
tion has  also  been  reinitialized,  and  some  means  of  doing  this  must  be 
t introduced. 
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Modification  A 

Perhaps  the  simplest  and  most  robust  approach  is  to  effectively 
suspend  all  communication  of  distance  information  for  a sufficiently 
long  period  of  time  to  insure  that  all  nodes  have  reinitialized.  Sever- 
al possible  mechanisms  for  achieving  this  present  themselves:  one  is 
for  the  node  detecting  a branch  length  increase  that  affects  any  of  his 
current  shortest  distances  to  decide  upon  a future  time  at  which  communi- 
cation of  distance  information  based  on  reinitialization  will  resume, 
and  to  send  this  to  his  neighbors  who  continue  to  propagate  it  through- 
out the  network.  Implicit  here  is  the  existence  of  a time  base  common 
to  all  nodes,  and  the  availability  to  each  node  of  (at  least  an  upper 
bound  on)  the  time  it  takes  for  the  "reinitialization  message"  he  initi- 
ates to  propagate  throughout  the  network,  which  implies  a more  global 
knowledge  of  the  network.  Since  communication  of  distance  information 
is  suspended  for  this  period,  it  is  advantageous  to  make  it  as  small  as 
possible.  This  will  clearly  be  aided  if  a mechanism  exists  for  making 
these  "reinitialization  messages"  top  priority  so  that  they  bypass  all 
queues  and  buffers  at  each  node. 

Other  mechanisms  for  achieving  the  same  basic  objective  have  been 
devised.  Together  with  that  above,  they  share  the  convergence  of  the 
static  algorithm  and  its  robustness  with  respect  to  other  topological 
changes  such  as  introduction  of  new  nodes  or  branches.  Its  feasibility 
requires  that  branch  length  increases  occur  infrequently  relative  to  the 
total  time  information  transmission  is  suspended  and  the  static  algo- 
rithm subsequently  converges. 
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Modification  B 

Instead  of  suspending  all  communication  of  distance  information  un- 
til all  nodes  can  be  guaranteed  to  have  reinitialized  and  all  distance 
information  can  be  trusted,  a mechanism  has  been  devised  for  each  node 
to  determine  which  distance  information  he  receives  is  trustworthy  (in 
that  all  nodes  further  down  the  corresponding  path  are  guaranteed  to 
have  reinitialized)  and  which  is  not.  In  simple  terms,  on  hearing  that 
a branch  length  increase  or  failure  has  taken  place,  a node  ignores  dis- 
tance information  sent  by  any  questionable  neighbor  until  that  neighbor 
acknowledges  that  he,  too,  is  aware  of  the  change.  As  each  neighbor  in 
turn  so  acknowledges,  the  embargo  on  his  information  is  removed.  In 
this  way,  some  convergence  toward  the  new  solution  can  be  taking  place 
while  news  of  the  change  is  still  propagating  through  the  network. 

More  precisely,  this  modified  algorithm  involves  the  use  of  a 
"special  action",  which,  as  in  the  previous  algorithm,  is  initiated  by  a 
node  detecting  a branch  increase  or  failure  that  affects  any  of  his 
current  shortest  distance  assessments.  When  this  occurs,  the  initiator 
assigns  a unique  index  to  the  special  action  (consisting  of  his  node 
number  and  a counter),  and  does  the  following: 

1.  Reinitializes 

2.  Places  an  embargo,  indexed  by  the  special  action,  on  distances 
received  from  every  neighbor,  except  for  the  neighbor  at  the 
opposite  end  of  the  affected  link. 

3.  Transmits  all  of  his  new  shortest  distances  to  each  neighbor, 
along  with  the  special  action  index. 
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Each  neighbor  receiving  this  information  takes  analogous  action,  in 
Step  2 placing  an  embargo  on  all  his  neighbors  except  for  the  one  he  has 
just  heard  from,  and  in  Step  3 using  the  same  special  action  index.  He 
then  waits  until  a message  is  received  from  a neighbor,  at  which  point 
his  action  is  governed  by  the  following: 

Case  1.  Message  contains  no  special  action  index. 

A.  If  there  is  no  embargo  on  this  node,  calculate  the  new  dis- 
tances via  this  neighbor  and  proceed  normally. 

B.  Else,  ignore  the  message. 

Case  2.  Message  contains  a special  action  index. 

A.  If  there  is  no  embargo  on  this  node  with  the  same  index  as  in 
the  message,  perform  steps  1-3  above. 

B.  Else,  remove  the  matching  ban  from  this  particular  neighbor. 
If  no  other  embargo  exists,  calculate  new  distance  via  this 
node.  Otherwise,  ignore  distance  component  of  message. 

Several  special  actions  can  exist  within  the  network  simultaneously, 
each  distinguishable  by  its  index.  A branch  failure  can  be  treated  as  a 
branch  length  increasing  to  infinity.  As  before,  to  insure  convergence 
it  is  necessary  for  the  topology  and  branch  lengths  of  the  network  to 
remain  constant  for  a long  enough  period  of  time  for  the  algorithm  to 
find  the  new  solution. 

While  this  algorithm  solves  the  problem  of  increasing  branch 
lengths,  it  introduces  a new  difficulty,  viz.  that  of  adding  new  nodes 
or  links  to  the  network.  In  the  preceding  algorithms,  bringing  up  a new 
node  or  link  causes  no  trouble.  It  introduces  new  paths,  which  can  only 
serve  to  decrease  shortest  paths  through  the  network.  Thus,  one  should 
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be  able  to  consider  this  in  the  category  of  decreasing  branch  lengths, 
discussed  earlier.  However,  in  connection  with  this  particular  version 
of  the  algorithm,  adding  a new  link  can  cause  a serious  problem,  as  il- 
lustrated by  the  following  example: 

Example  2. 


Let  node  5 be  the  destination;  suppose  branch  length 
s^^  increases  to  20,  and  apply  Modification  B.  Further- 
more, suppose  we  have  reached  the  point  at  which  node  4 
has  informed  nodes  2 and  3 of  special  action  (4;1),  node 
3 has  acknowledged  receipt  of  this  message,  but  node  2 
has  not.  Pictorially: 


W(4;l) 


W(4;l) 


where  W(4;l),  read  wait  for  (4;1),  indicates  that  a ban 
exists  on  distances  via  the  specified  outgoing  neighbor. 
At  this  point,  every  node  is  aware  of  special  action 
(4;1)  except  node  1,  who  believes  that  his  shortest  dis- 
tance to  node  5 is  3.  Furthermore,  node  3 has  fulfilled 
his  duties  with  respect  to  action  (4;1),  and  no  longer 
has  any  memory  of  it.  If  node  2,  for  whatever  reason,  is 
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slow  in  relaying  the  message  concerning  action  (4;1)  to 
node  1,  a problem  can  develop.  Suppose  that  a pair  of 
links  between  nodes  1 and  3,  each  of  length  1,  is  intro- 
duced at  this  point.  Node  1 sends  its  distance  of  3 to 
node  3,  who  in  turn  tells  node  4 that  his  distance  to  the 
destination  has  been  cut  to  4.  Node  4,  having  already 
removed  the  ban  from  node  3,  passes  this  on  to  node  2. 

When  node  2 finally  transmits  the  special  action  message 
to  node  1,  it  is  accompanied  by  the  false  distance  of  6. 

This  is  clearly  an  undesirable  situation,  brought  about 
by  the  introduction  of  the  new  links  between  nodes  1 and 
3.  It  happened  because  node  3 informed  all  of  his  neigh- 
bors of  the  special  action,  and  later  acquired  a new  i'. 

neighbor  who  was  unaware  of  the  action. 

Modification  B can  be  used  for  networks  with  increasing  and  decreasing 

branch  lengths  and  failures,  but  some  other  technique  must  be  utilized 

to  add  nodes  or  links  to  the  network. 

Modification  C 

This  version  of  the  algorithm  is  a modification  of  the  previous 
one,  developed  to  solve  the  problem  of  adding  links.  If  a node  remem- 
bers the  indices  of  special  actions  after  he  processes  them,  he  can 
insure  that  any  new  neighbors  that  he  acquires  are  informed  of  the  most 
recent  special  actions.  But  how  long  must  a node  remember  which  actions  j 

have  occurred?  In  a network  in  which  branch  increases  are  a common  oc-  I 
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Thus,  in  order  to  enable  a node  to  decide  when  it  is  safe  to  forget 
a given  special  action,  we  modify  Algorithm  B.  Now,  when  a node  first 
hears  about  a special  action,  he  performs  the  same  steps  as  before,  with 
one  alteration.  The  node  now  remembers  which  neighbor  first  informed 
him  of  the  action,  and  may  not  send  acknowledgement  to  that  neighbor  im- 
mediately. He  tells  his  other  neighbors  about  the  action,  waits  until 
they  all  acknowledge  receipt  of  that  message,  and  then  sends  his  ac- 
knowledgement to  the  node  which  first  sent  him  word  of  the  special 
action.  The  initiator  of  the  action  plays  the  role  of  a temporary  con- 
troller. When  he  has  received  acknowledgement  from  all  of  his  neigh- 
bors, every  node  in  the  network  knows  of  the  special  action.  The  ini- 
tiator can  now  send  the  "all  clear"  signal,  allowing  each  node  to  erase 
memory  of  the  action. 

Unfortunately,  this  modification  introduces  a new  difficulty;  link 
and  node  failures  no  longer  behave  the  same  as  branch  increases.  For 
instance,  certain  node  or  link  failures  can  disrupt  the  flow  of  acknowl- 
edgement messages,  necessitating  that  some  form  of  emergency  action  be 
taken.  We  have  developed  a mechanism  which  enables  any  node  that  de- 
tects a failure  that  could  interrupt  the  flow  of  acknowledgements  to 
initiate  this  emergency  action.  The  emergency  procedure  itself  must  be 
able  to  bring  the  network  to  a state  from  which  convergence  is  guaran- 
teed; perhaps  the  simplest  such  procedure  is  to  temporarily  invoke 
Modification  A in  those  networks  where  it  is  feasible. 
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4. A Summary 

The  static  algorithm  in  Appendix  5 is  the  foundation  upon  which 
each  of  our  algorithms  is  based.  This  fundamental  algorithm  has  local- 
ized information  and  communication  requirements,  operates  asynchronous- 
ly, and  is  guaranteed  to  find,  in  finite  time,  all  shortest  paths  in  a 
static  network;  also  in  networks  in  which  branch  length  decreases  and 
addition  of  new  nodes  and  links  take  place.  For  networks  with  increas- 
ing branch  lengths  or  branch  failures,  some  means  of  reinitializing  the 
algorithm  is  introduced.  Modification  A is  the  most  versatile  of  these 
reinitialization  schemes;  it  can  accommodate  branch  increases,  decreases 
failures  and  additions,  but  it  requires  suspension  of  all  distance  com- 
munication and  shortest  path  calculations  for  a sufficiently  long  period 
that  all  nodes  can  be  guaranteed  to  have  reinitialized.  However,  this 
algorithm  does  provide  a simple  means  of  reinitialization  and  is  the 
most  robust  of  the  algorithms  that  we  have  developed.  Modification  B is 
a more  complicated  algorithm.  Its  reinitialization  mechanism  relies 
upon  an  acknowledgement  system  that  increases  the  required  information 
storage  capacity  of  each  node.  It  can  accommodate  branch  length  de- 
creases, increases,  and  failures,  but  not  the  introduction  of  new  nodes 
or  links.  Its  main  advantage  is  the  ability  to  begin  converging  toward 
the  new  solution  soon  after  the  reinitialization  process  begins;  its 
main  disadvantage  is  the  need  for  special  treatment  in  order  to  insert 
new  links.  Modification  C contains  a more  sophisticated  acknowledgement 
system,  further  increasing  the  storage  requirements  of  each  node. 
Branch  additions  and  branch  length  increases  and  decreases  can  be  accom- 
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niodated,  as  well  as  certain  branch  failures;  a node  detecting  any  fail- 
ure is  capable  of  determining  whether  it  should  initiate  an  emergency 
procedure  in  order  to  guarantee  convergence.  We  note  that  algorithms 
utilizing  acknowledgement  systems  are  being  investigated  elsewhere.  For 
instance,  Merlin  and  Segall  [3]  have  developed  an  algorithm  which  is 
more  complicated  than  any  of  ours  and  which  solves  a reduced  version  of 
the  problem  we  consider. 

The  investigation  of  decentralized  network  algorithms  is  continu- 
ing as  the  doctoral  research  project  of  Jeffrey  M.  Abram. 
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5.  A GEOMETRIC  APPROACH  TO  SINGULAR  ESTIMATION 

The  paper  presented  at  the  1976  lEFE  International  Symposium  on  In-  j 

formation  Theory:  i 

\ 

"A  Geometric  Approach  to  Singular  Estimation  Problems,"  t 

Ian  B.  Rhodes,  1976  IEEE  International  Symposium  on  Infor-  i 

mation  Theory,  Ronneby,  Sweden,  June  1976.  ? 

J 

considered  the  singular  estimation  problem  characterized  by  the  follow-  | 

ing  question:  Given  the  constant  linear  system  ' ; 

/ 

■j 

x(t)  = Ax(t)  + Dv(t) 


with  noise-free  observations 


y(t)  = Cx(t) 

where  v is  white  Gaussian  noise,  what  states  x(t)  can  be  determined 
exactly  by 

(a)  differentiation  of  the  current  output  y(t)? 

(b)  constructing  an  appropriate  observer  that  utilizes 
smoothly  the  past  observations  y over  l0,t]? 

Our  answers  to  these  questions  concerning  singular  estimation  were 
based  for  the  first  time  on  a geometric  viewpoint  drawing  on  the  ideas  .j 

first  introduced  by  Wonham  and  Morse  [4]  and  Basile  and  Marro  [5] . 

Assuming  without  loss  of  generality  that  (A,D)  is  completely  control- 
lable and  (C,A)  completely  observable,  the  answers  we  found  are: 
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(a)  By  differentiation,  x(t)  can  be  determined  exactly  modulo  the 
subspace  W,  where  W is  the  maximal  (A,D)-invariant  subspace  contained  in 
the  nullspace  of  C.  The  maximal  (A,D)-invariant  nullspace  in  a given 
subspace  was  first  introduced  by  Wonham  and  Morse  [4]  in  their  geo- 
metric approach  to  decoupling  and  other  problems.  Specifically,  W is 
the  largest  subspace  W that  is  contained  in  the  nullspace,  K(C),  of  C 
and  satisfies  AW  C W + R(D),  where  R(D)  denotes  the  range  of  D.  It  is 
known  [4]  that  W can  be  obtained  using  the  iterative  formula 


Wf^l  = N(C)n  A'^  [W.  + R(D)],  = N(C) 


which  converges  to  the  limit  W in  at  most  n-1  steps.  It  is  also  known 
[6]  in  the  single-output  case  (where  C is  a row  vector)  that  W can  be 
found  as 


W = Nullspace 


where  d is  the  smallest  integer  such  that  CA  D 0.  In  the  context  of 
singular  estimation,  this  reflects  the  well-known  idea  that  the  output  is 
differentiated  until  the  white  noise  v first  appears;  because  CA^D  = 0 
for  i < d,  the  first  d derivatives  of  y(t)  do  not  contain  v and  the 
equation 
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V = N 


D' 

D'A' 


L D'A'** 


For  either  single-input  or  multi-input  systems,  V can  be  interpreted  in 
terms  of  the  observer  error,  x(t)  = x(t)  - $(t),  which  satisfies 


x(t)  = (A  - KC)  5c(t)  + Dv(t) 


The  subspace  V is  the  largest  set  of  x's  that  can  be  made,  by  appropri- 
ate choice  of  the  observer  gain  K,  unaffected  by  the  noise  input  v.  If 
the  covariance  of  the  initial  state  is  zero  along  V and  K is  chosen  ap- 
propriately, the  covariance  of  x(t)  remains  zero  along  V for  all 
future  t. 

In  summary,  by  differentiation  we  can  determine  x(t)  modulo  W and 
by  an  observer,  assuming  appropriate  initial  noise  covariance,  we  can 
determine  x(t)  exactly  along  V.  Now,  if  we  take  (W)-*-  to  be  the  proto- 
typical set  that  can  be  determined  exactly  by  differentiation  and  (V)-*- 
to  be  the  prototypical  subspace  modulo  which  x(t)  can  be  determined 
using  an  observer,  then 

(i)  the  set  of  states  that  can  be  determined  exactly  both  by 
differentation  and  by  an  observer  is  vn(W)-*-  - R.  It  turns 
out  that  this  subspace  is  another  fundamental  subspace  type 
introduced  by  Vonham  and  Morse:  R is  the  maximal  (A',C')- 
controllability  subspace  contained  in  N(D');  for  a defini- 
tion of  this,  see  [4]. 
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(ii)  The  set  of  states  that  cannot  be  determined  either  by  differ- 
entiation or  by  an  observer  is  (V)^nw  - S,  i.e.  using  both 
differentiation  and  an  observer,  the  state  can  be  determined 
only  modulo  S.  In  this  case,  S is  the  maximal  (A,D)-control- 
lability  subspace  in  N(C). 

Of  course,  if  S is  simply  the  origin,  then  the  entire  state  can  be 
determined  exactly  using  together  differentiation  and  an  observer.  A 
sufficient  (but  not  necessary)  condition  for  this  is  that  the  system 
have  a single  ” input" , i.e.  the  noise  v is  real-valued,  and  not  vector- 
valued. Similarly,  if  R is  simply  the  origin,  then  no  state  can  be  de- 
termined both  by  differentiation  and  by  using  an  observer.  A sufficient 
(but  not  necessary)  condition  for  this  is  that  the  system  have  a single 
output.  If  both  R and  S are  simply  the  origin,  then  all  states  can  be 
determined  but  none  by  both  differentiation  and  by  an  observer.  In 
other  words,  the  state  space  then  can  be  separated  into  a direct  sum  of 
two  sub-spaces , one  of  these  being  the  set  of  states  determinable  by 
differentiation  and  the  other  the  set  of  states  determinable  by  an  ob- 
server. This  will  be  so  for  single-input,  single-output  systems  for 
which,  as  has  been  noted  above,  explicit  algebraic  formulas  are  avail- 
able for  both  V and  W. 

We  have  thus  provided  an  alternative  interpretation  of  the  special 
properties  enjoyed  by  single-input,  single-output  systems  insofar  as 
singular  estimation  is  concerned.  At  the  same  time,  we  have  shown  that 
the  ideas  and  objects  of  geometric  systems  theory  are  convenient  and 
natural  for  solving  the  general  multi-input,  multi-output  singular  esti- 
mation problem. 
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For  the  discrete-time  system 


"k*!  = * “'k 

^k  = <=*k 


the  states  that  can  be  determined  exactly  using  an  observer 


= Ax  + K[y.  - Cx.  ] 


is  exactly  as  in  the  continuous -time  case,  viz.  it  depends  on  the  co- 
variance  of  the  initial  state,  but  is  at  most  the  subspace  V defined 
earlier.  However,  the  analog  of  differentiation  is  differencing,  and 
the  result  of  waiting  for  the  output  data  so  that  differencing  can  be 
performed  is  that  smoothing  becomes  involved.  It  is  found  that  the 
continuous-time  result  holds  true  for  deducing  Xj^  from  future  and  pre- 
sent observation  data  y ^ , j > k,  i.e.  Xj^  can  be  determined  modulo  the 
sub-space  W.  Because  the  useful  data  will  in  fact  extend  at  most  n-1 
steps  into  the  future,  we  can  fix  data  availability  to  time  k and 
determine  x.  . modulo  W using  data  to  time  k,  for  an  appropriate  j, 

K-J 

0 < j < n-1.  It  has  been  our  longstanding  conjecture  that  suitable  for- 
ward propagation  of  W or  (W)"*"  is  intimately  connected  to  the  constant 
directions  of  the  Riccati  equation  [7]  - [9],  but  the  exact  form  of 
this  relationship  has  yet  to  be  established. 

For  both  continuous-time  and  discrete-time  systems,  corresponding 
new  results  for  the  singular  control  problem  follow  by  standard  duality 


arguments . 
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6.  COMPENSATOR  DESIGN  FOR  POLYNOMIAL  MATRIX  SYSTEM  DESCRIPTIONS 
6.1  Introduction 


A major  conceptual  development  in  system  theory  over  the  last  cou- 
ple of  decades  has  been  the  replacement  of  the  classical  transfer  func- 
tion approach  in  the  frequency  domain  by  the  state-space  approach  in 
the  time  domain.  However,  in  the  past  few  years,  there  has  been  a re- 
surgent interest  in  frequency-domain  methods.  This  is  due  in  large  part 
to  the  development  of  an  alternative  time-domain  technique,  namely  the 
differential  operator  approach  to  the  analysis  and  synthesis  of  time- 
invariant  linear  multivariable  dynamical  systems.  Rosenbrock  [10]  has 
shown  how  to  derive  many  state-space  results  through  the  analysis  of 
certain  polynomial  matrices.  Independently,  Popov  [11]  has  shown  how 
such  seemingly  state-space-theoretic  problems  as  the  realization  of  sys- 
tems in  controllable  canonical  form  and  the  determination  of  the  con- 
trollability indices  could  be  elegantly  solved  by  polynomial  matrix 
methods,  starting  from  the  transfer  matrix.  More  recently,  a signifi- 
cant number  of  investigators  [12] -[20]  have  used  polynomial  matrix  meth- 
ods to  solve  other  problems,  employing  the  fact  that  the  (pxm)  transfer 
matrix,  T(s),  of  a linear  time-invariant  multivariable  system  can  be 
factored  as 


1 


r 

it 

i; 

1 

[ 

! 

'i 

I 


T(s)  = H(s)p"^(s)  = Pq^(s)  Q(s)  (1) 

where  R(s)  and  P(s)  [Pq(s)  and  Q(s)]  are  relatively  right[left]  prime 
polynomial  matrices  in  the  Laplace  operator  s,  and  P(s)  [Pq(s)]  is 
column[row]  proper.  Such  a factorization  directly  implies  a minimal 
time  domain  realization  of  T(s)  in  differential  operator  form,  namely 


N 
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P(D)2(t)  = u(t) 

y(t)  = R(D)2(t)  (2) 

or 

PQ(D)y(t)  = Q(D)u(t)  (3) 

where  P(D)  and  R(D)[Pq(D)  and  Q(D)]  are  polynomial  matrices  of  dimen- 
sions mxm  and  pxm[pxp  and  pxm]  in  the  differential  operator  D=d/dt  with 
P(D)[Pq(D)]  columnfrow]  proper  and  nonsingular,  2(t)  is  a p-vector 
called  the  partial  state,  u(t)  is  the  m-vector  input,  and  y(t)  is  the 
p-vector  output. 

The  equivalent  state-space  representation  of  (2)  is  just 

(DI-A)x(t)  = Bu(t) 

(4) 

y(t)  = Cx(t) 


where  x(t)  is  the  state  vector  and  A,B,C  are  real  matrices  of  appropri- 
ate dimensions. 

In  view  of  (1)  and  (2)  or  (3),  the  Laplace  operator  s and  the  dif- 
feratial  operator  D can  be,  and  will  be,  freely  interchanged  in  our  sub- 
sequent discussion.  It  should  be  noted  that  a differential  operator 
description  of  the  dynamical  behavior  of  a physical  system  often  follows 
as  a direct  r suit  of  applying  well-known  physical  laws  to  model  the 
system. 

One  of  the  most  important  features  of  the  controllable  differential 
operator  representation  (2)  can  be  seen  if  we  consider  the  effect  of  the 
linear  state  variable  feedback  (Isvf)  on  a compensated  system  defined  by 
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the  control  law 


u(t)  = Fx(t)  + Jv(t)  (5) 

in  the  case  of  state-space  representation  of  the  form  (4),  or 

u(t)  = F(D)z(t)  + Jv(t)  (6) 

in  the  case  of  differential  operator  representation  of  the  form  (2).  In 
(5),  F and  J are  real  gain  matrices  of  appropriate  dimensions  and  J is 
assumed  to  be  nonsingular.  In  (6),  F(D)  is  an  arbitrary  polynomial  ma- 
trix having  column  degree  less  than  that  of  P(D).  A difficulty  in  phy- 
sically implementing  an  Isvf  control  law  will  occur  whenever  the  entire 
state  of  the  system  is  not  directly  measurable;  i.e.,  when  only  y(t)  = 
R(D)z(t)  is  available  for  direct  measurment.  This  problem  can  be  cir- 
cumvented in  the  case  of  an  observable  state-space  system  through  the 
employment  of  a Luenberger  observer.  An  entirely  analogous  result  can 
be  obtained  by  the  differential  operator  approach.  In  this  regard,  the 
following  result  [12,  Theorem  III]  is  important: 

Consider  the  differential  operator  representation  (2). 

For  any  F(D)  of  lower  column  degree  than  P(D),  there 
exists  a triple  {Q(D) ,H(D) ,K(D) } of  polynomial  matrices 
satisfying  the  following  two  properties: 

(a)  K(D)P(D)  + H(D)R(D)  = Q(D)F(D) 

(b)  Q ^(D)IH(D)  K(D)]  is  an  asymptotically 
stable  proper  transfer  matrix. 

The  significance  that  this  result  has  for  compensator  design  can  be  seen 
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It  is  seen  from  the  diagram  that 

f 

' u(t)  = w(t)  + Jv(t) 

If  w(t)  exponentially  approaches  F(D)z(t)  then  the  compensation  scheme 
I (6)  will  be  realized  asymptotically.  A little  algebra  using  (2)  and  the 

relation  from  the  block  diagram 

i 

j Q(D)w(t)  = K(D)u(t)  + H(D)y(t) 

i| 

i 

j shows  that  this  is  precisely  what  conditions  (a)  and  (b)  of  the  theorem 


ensure;  furthermore,  the  compensator  is  stable  and  realizable  because 
its  transfer  functions  Q ^(s)H(s)  (from  y to  w)  and  Q ^(s)K(s)  (from  u 
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to  w)  are  so  specified  by  condition  (b).  A procedure  for  constructing  a 
triple  {H(D),  K(D) , Q(D)}  with  the  two  properties  in  the  theorem  have 
been  given  by  Wolovich  [12],  using  the  classical  "eliminant  matrix"  of 
two  polynomials.  There  are,  however,  no  methods  available  for  con- 
structing a triple  with  these  two  properties  and  the  additional  property 
that  the  determinant  of  Q(D)  has  minimum  degree.  This  corresponds  to 
the  design  of  a stable,  minimum-order  compensator.  Our  research  efforts 
have  been  directed  towards  the  development  of  such  methods,  with  a view 
to  subsequent  application  of  these  ideas  to  other  system  problems. 

6.2  Problem  Formulation 

Consider  a linear  time-invariant  multivariable  system  defined  by 
the  equations 

P(D)z(t)  = u(t) 
y(t)  = R(D)z(t) 

where  P(D)  and  R(D)  are  polynomial  matrices  of  dimensions  mxm,  pxm,  re- 
spectively, in  the  differential  operator  D=d/dt  and  P(D)  is  column  pro- 
per and  nonsingular.  Our  goal  is,  for  any  polynomial  matrix  F(D)  of 
dimension  mxm  having  lower  column  degree  than  Q(D)  to  find  a triple 
{H(D),  K(D),  Q(D)}  of  polynomial  matrices  of  appropriate  dimensions 

which  satisfies  the  following  properties: 

i)  H(D)P(D)  + K(D)R(D)  = Q(D)F(D)  (7a) 

ii)  Q ^[H(D)  K(D)]  is  an  asymptotically  stable  (7b) 

proper  transfer  function 

iii)  det  Q(D)  is  of  minimal  degree  (7c) 


j 


i 

{ 


J 
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For  simplicity,  the  argument  s or  D will  be  omitted  hereafter, 
since  the  two  are  interchangeable.  Equation  (7a)  can  be  rewritten  as 


or,  equivalently. 


HP  + KR  - QF  = 0 


rH'i 

[P'  R'  - F'l  K'  = [0] 

Lq’J 


T = [P*  R'  - F'] , 

'H-' 

G = K' 

.Q’J  . 

where  P'  is  a row  proper,  (RP”^)'  is  proper,  and  the  row  degree  of  F' 
is  less  than  that  of  P’  . Hence  the  polynomial  matrix  T is  row  proper 
and  of  full  rank.  Therefore,  instead  of  solving  (7a)  for  the  triple 
{H,  K,  Q},  we  may  solve  the  linear  equation  on  free  polynomial  modules. 


TG  = lO] 


where  the  elements  of  the  composite  polynomial  matrix  G are  to  satisfy 
conditions  (7b)  and  (7c). 


6.3  The  Free  Modular  Approach 

Consider  the  linear  equation 


Tm  = n 


where  T is  a linear  map  from  the  free  R[s]-module  M of  rank  r(M)  to  the 
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free  R[s]-module  N of  rank  r(N).  It  is  clear  that  (9)  has  a solution  if 
and  only  if  n € Im  T.  Because  R[s]  is  a free  principal  ideal  domain,  Im 
T and  ker  T are  free  modules.  Equation  (9)  thus  involves,  for  complete 
analysis,  the  computation  of  two  bases,  one  for  Im  T,  and  one  for  ker  T. 
We  also  have  the  customary  identity 


r(Im  T)  + r(ker  T)  = r(M). 


Our  procedure  for  solving  (8')  is  first  to  determine  a minimal  re- 
duced basis  for  ker  T and  then  use  this  basis  to  construct  a polynomial 
matrix  G which  meets  all  the  requirements. 

Let  ii=r(ker  T),  be  a basis  for  ker  T,  where  each  element 
can  be  expressed  in  the  manner 


v.  = / V.  .s-^  , V 5^0 

1 j4o  "'J  "*>^1 


where  r^  is  the  column  degree  of  v^.  We  say  that  {Vj^,...,v^}  is  a 
minimal  reduced  basis  for  ker  T if  the  rank  of  the  constant  matrix 
formed  from  the  last  m rows  of 


(v  ...  V ] 
r,  r 

1 ro 


(10) 


is  equal  to  m and 


r.  is  minimal. 
1 


For  a row  proper  and  full  rank  matrix  T,  a minimal  reduced  basis 
for  ker  T always  exists,  though  it  is  not  unique.  There  are  algorithms 
for  constructing  such  bases,  although  we  shall  not  present  them  here. 
Without  loss  of  generality,  we  may  assume  that  our  basis  is  always  mini- 
mal and  reduced. 
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Returning  to  equation  (8'),  we  have  an  equation  immediately  recog- 
nized as  (9)  with  n = 0 and  T:  -♦  Rls]"*  given  by  matrix 
[P’  R'  - F'].  The  matrix  G = [H  K Q] ' can  then  be  constructed  using  the 
basis  {vj^,...,v^}  for  ker  T.  In  view  of  (8),  n = rank(ker  T)  = m + p, 
and  the  column  elements  of  G can  then  be  expressed  as  linear  combina- 
tions of  the  basis  elements  v, , . . . ,v  ; i.e. 

1 ’ n 


!.  = y ..  .V. 

‘ pi  ‘’J  J 


i — 1 , . . . . ,ro 


where  G = [g^],  and  the  g^  and  v^  are  partitioned  as 


h. 

«i  = *^i 

‘li 


V. 

1 = 


The  determinant  of  Q'  can  be  written  as  the  exterior  product  of  the  q^; 


Det  Q'  = q^^A. . .Aq^ 


By  rewriting  (12)  with  the  aid  of  (11),  while  taking  due  account  of 
the  multi-linear  and  skew-symmetric  properties  of  the  expression,  it  is 
possible  to  obtain  the  revealing  form 

Det  Q'  = 2]\(i)^_.^k(m)^®^^‘^k(l)^^^''-"^‘^k(m)^®^^ 
where  the  sum  is  taken  over  all  integer  arrangements  satisfying 


1 < k(l)  < < k(m)  < n 


It  then  becomes  a matter  of  determining  the  b's  in  (13). 
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Let  denote  the  set  of  all  strictly  increasing  sequences  of 

positive  integers  r,  of  length  m,  chosen  from  1 i.e.  , 

l<r(l)<. . . . <r(m)<n.  Because  the  known,  finding  the  b's 

in  (13)  is  essentially  a decomposition  problem.  Select  a basis 

for  an  F-vector  space  V;  then  an  m-vector  has  the  general 
form  (13)  for  the  sum  taken  over  all  integer  arrangements  in  ^ and 
with  the  b's  in  F.  Decomposition  means  that  m vectors 


It 

/.  = 5^  a.  .V. 

1 ^1  "‘j  j 


a,  . e F 

itJ 


can  be  found  in  V such  that  the  exterior  product 


V,  A Av 

1 m 


is  equal  to  the  m-vector  in  question. 


The  mathematical  literature  dealing  with  the  construction  b a 
tends  to  be  framed  in  a vector  space,  rather  than  a free-module,  con- 
text. For  example,  the  following  result  is  known  [21,  p.  568]: 

Proposition:  In  an  (m+1) -dimensional  F-vector  space  V, 
every  m-vector  is  decomposable. 

An  extension  of  this  result  to  free  R[s] -modules  has  been  given  by 
Sain[13].  Using  this  extension,  a solution  to  the  compensator  design 
problem  has  been  constructed  for  single-output  systems. 


6. A Single-Output  Systems 

For  single-output  systems,  p=l  and  so  m=n-l,  and  a polynomial  ma- 
trix A(s)  = (a.  .(s)]  can  be  found  such  that 
^ > J 
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i)  = Det  , 

ii)  G = (vj^ v^]A  , 

iii)  Q ^(H  K]  is  proper. 


e Q, 


in,n  ’ 


where  A^  is  the  submatrix  of  A consisting  of  the  r(l) , . . . ,r(m)-th  col- 


umns of  A. 


The  matrix  A(s)  can  be  determined  in  one  or  two  steps  depending  on 
the  solution  to  (13) . Suppose  that  we  have  a solution  b to  (13)  for  a 
desirable  polynomial  Det  Q'  . Step  1 is  to  define  a matrix  B as  follows 


^2  “ ^3''^! 


where  b.  is  the  shorthand  notation  for  b , and  r.  is  the  i-th  element 

1 

of  Q when  its  elements  are  ordered  lexicographically.  If  b,  is  a 
01^11  1 

constant,  we  may  simply  let  A be  equal  to  B;  otherwise,  step  2 is  neces- 
sary to  convert  the  rational  matrix  B to  a polynomial  matrix.  Step  2 
is  to  determine  the  matrix  A from  the  fact  that 


(v,A Av  )Av.  = 0 , 

1 m i * 


i = 1 , . . . ,m 


where  v.  is  defined  by  (15)  except  that  a.  . are  elements  of  R[sl  and 
^ ^ > J 

{vj, — ,v^j  is  a minimal  reduced  basis  for  V = ker  T(s);  i.e.. 


X)  b vA  ) ^ ^ 

r.cQ  *^1  i=l  ^ 

1 m,n 
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where  v denotes  the  exterior  product  of  the  m elements  of  {v, , . . . ,v  } 
rj  ‘ 1’  ’ n 

corresponding  to  r^c  Because  A = 0 if  i is  in  the  sequence 

r for  ail  r c Q and  v,A...Av  cannot  vanish  because  {v.}  is  a basis 
m,n  1 n ‘ 


for  V,  we  have 


where 


“ ■(:)■ 


V (-1)^  b.a  . = 0,  j = l,...,m 
1 1 » j 

i=l 


number  of  sequences  in  Q .We  are  then  essentially 


looking  for  a polynomial  basis  for 


ker  [bj^,-b2,...,(-l)  ^b^j] 


in  the  free  R[s] -module  sense.  If  one  of  the  non-zero  b is  a constant, 
then,  by  reordering  {v^}  such  that  bj^  is  that  constant,  the  column  ele- 
ments of  B defined  in  (17)  form  a basis  for  the  kernel  (19).  Otherwise 
B is  a rational  function  and  the  column  vectors  of  B form  a basis  for 
that  kernel  in  an  R(s)-vector  space  context  but  not  necessarily  a free 
modular  basis.  An  algorithm  has  been  developed  to  construct  a free 
modular  basis  from  the  B matrix. 


6.5  Multi-Output  Systems 

For  multi-output  systems,  p > 1 and  so  m < n-1,  and  the  free  Rts] 
module  extension  of  the  Proposition  given  at  the  end  of  Section  3 is  no 
longer  applicable.  Generalizations  of  that  Proposition  in  a vector 
space  setting  are  known;  e.g..  Theorem  1.4  in  [22],  but  we  have  so  far 
been  unable  to  establish  any  extension  to  a free  R[s]-module  setting. 
That  such  an  extension  should  be  possible,  at  least  under  certain  cir- 
cumstances, is  illustrated  by  the  following  alternative  approach. 


1 
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Consider  the  possibility  of  triangularizing  the  transfer  function 
matrix  by  precompensation;  since  the  set  of  proper,  stable,  real,  ra- 
tional transfer  functions  forms  a Euclidean  domain,  it  is  possible  to 
triangularize  a transfer  function  matrix  by  postmultiplying  it  by  a 
unimodular  matrix  of  rational  functions  [23],  thus  transforming  a multi- 
input, multi-output  system  into  a sequence  of  multi-input,  single-output 
systems.  Postmultiplication  by  a unimodular  matrix  amounts  to  allowing 
"input  dynamics"  in  the  compensation  scheme. 

It  can  be  shown  that,  for  a minimal  system  described  by  (2),  the 
exterior  products  d^,  for  all  re  relatively  prime  polynomi- 
als. Hence,  the  determination  of  minimum  degree  of  Q*  matrix  is  essen- 
tially to  find  a^(s),  r £ ^nd  the  smallest  integer  k such  that 


i)  Cq  + + Cj^s*'  = a^(s)d^ 


(s)  is  a stable 


polynomial  of  degree  k,  and 
ii)  deg  a^(s)  + deg  d^(s)  < k. 

This  problem  seems  to  be  related  to  the  generalization  of  the  so-called 
"eliminant  matrix"  of  two  polynomials. 

Further  investigation  of  these  questions  is  being  carried  on  as  the 
doctoral  research  project  of  Olive  Y.  Liu. 


7. 


ESTIMATION  AND  STOCHASTIC  CONTROL;  THE  INNOVATIONS  CONJECTURE 


7.1  Introduction 

The  informational  equivalence  of  the  "signal  in  additive  white 
noise"  observation  process 


^ ''t 


and  the  innovations  process 


V 


t 


t t 

^ ■ J yf  / ^s  ''t 

0 0 


(1) 


(2) 


is  known  to  hold  under  certain  conditions  and  to  fail  to  hold  under 
others;  an  account  of  these  can  be  found  in  Benes  [24].  Two  of  the  most 
important  sets  of  conditions  under  which  informational  equivalence  holds 
are: 

t 2 

1.  The  "signal"  y is  a second-order  process  with  E/qIy^I  ds«», 
the  "noise"  w is  an  uncorrelated-increment  second-order  pro- 
cess, future  noise  is  uncorrelated  with  past  signal,  and  our 
concern  with  with  linear  least-squares  estimation.  Specifi- 
cally, let  be  the  closed-linear  subspace  of  H = 

L„(Q,F,P)^^^  generated  by  {z^,se[0,t],  i = 1,2,. ..,m  = dimz  }. 

s s 


(1) 

For  the  complete  probability  space  (,fl,F,P),  H=L2(I7,F,P)  is  the 

Hilbert  space  of  real-valued,  zero-mean,  finite-variance  random  vari- 
ables on  (ft,F,P)  with  inner  product  <u,v>=E(uv). 
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•^Z  ^ I 

Then  y in  (2)  is  the  projection  E [y  |Z  ] of  y on  Z , 

y^  - yg“ygi  and  informational  equivalence  means  that  Z^  = 
for  all  t,  where  is  the  closed  linear  subspace  of  H gener- 
ated by  {v^,S£[0,t],  i = 1,2,. 
s 

2.  The  "signal"  y and  the  "noise"  w are  jointly-Gaussian  second- 

» 

order  processes,  y is  almost-surely  square-integrable  on  any 

finite  interval,  w is  a Wiener  process  whose  future  increments 

are  independent  of  past  y,  and  our  concern  is  with  least- 

squares  estimation.  Specifically,  let  by  the  sub-a-algebra 

of  F generated  by  {z^,s€[0,t],  i = 1,2,. ..,m}.  Then  ^ in  (2) 

is  the  conditional  expectation  E[y  |z  ],  y^  = y ~y^ , and  in- 

formational  equivalence  means  (mod  P)  for  all  t,  where 

N is  the  sub-a-algebra  of  F generated  by  {v^,sc[0,t],  i = 
c s 

1,2, . . . ,m} . 

We  have  constructed  a new  direct  proof  of  innovations  equivalence 
for  these  two  cases  using  only  elementary  facts  from  stochastic  pro- 
cesses and  estimation  theory  such  as  those  in  the  book  [25].  This  con- 
trasts with  most  of  the  existing  proofs  which  involve  deep  and  sophis- 
ticated results  in  the  theory  of  stochastic  processes.  In  terms  of 
directness,  generality  and  assumed  background,  our  proof  is  comparable 
to  that  recently  published  in  [26];  the  principal  argument  in  the 
proof,  however,  is  entirely  different. 

A paper  presenting  these  proofs  in  detail  is  in  preparation.  We 
summarize  here  the  essential  arguments  involved. 
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7.2  Linear  Least-Squares  Estimation 

The  essential  ingredient  of  the  proof  is  to  introduce  the  process  p 
defined  by 

t 


where  y = E ly„|N  ].  Recalling  that  N cZ  , it  is  inunediately  clear 
s S I S t L. 

from  (3)  that  M CZ  , and  thus  that  M + N cZ  , where  M is  the  sub- 
space  of  H generated  by  {p^.s  fO,t],  i = l,2,...,m}.  On  the  other  hand, 
rewriting  (3)  as 


it  is  seen  immediately  that  Z^cM^  + N^.  Thus 


= • 


(4) 


This  fact,  that  knowledge  of  the  past  of  both  p and  V is  equivalent  to 
knowledge  of  the  past  of  z,  is  the  very  reason  the  process  p is  intro- 
duced. 

Now,  substitution  of  (2)  into  (3)  gives 


r -sN.,,  ^ , 
^ = i ^s  - ''t 


(5) 


- 58  - 


Z N 

It  is  easily  proved  that  the  processes  (y  -y  ) and  V appearing  on  the 
right  side  of  (5)  are  uncorrelated,  and  by  an  easily  proved  result  (25, 
Lemma  4.3.2]  the  subspace  can  then  be  represented  as  the  set  of  in- 


tegrals 


t t 

\ = Jb’(s)[yg  - ygjds  + J b' (s)dv^;b€L“(0,t] 
0 0 


This,  in  fact,  is  a generalization  of  the  standard  result  that  can  be 
represented  as  the  set  of  Wiener  integrals  on  V,  i.e.. 


a'(s)dv^;  a€L2(0,t] 


In  view  of  (4) , any  vector  in  can  then  be  represented  as  a vector  sum 
+ n^,  where  m^  and  n^  have  representations  as  above,  i.e., 

t t 

\ = W Jb'(s)[y2-yNlds  + Jc'(s)dV^;  b,c£L”(0,tl 
0 0 

aZ  aN 

In  particular,  (y^-y^)  is  in  and  has  a representation  of  this  form: 

aN  z z n 

also,  because  y^  is  the  projection  of  y^  on  N^,  (y^-y")  is  orthogonal  to 


and  c’  =0.  Thus 


^Z  C ■ox  ft.  \r''Z  , 

^t  ■ ^t  - j ® 
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After  it  is  shown  that 


f Ww- 

ft  A '•-  » J * 


s)  dsdt  < 0° 


T I 1 2 

using  EJq  y^l  dt<o»,  it  then  follows  by  any  of  a number  of  arguments 

(Contraction  Mapping  Theorem,  Picard  Iterations,  Inversion  of  Volterra 

Operators)  that  y - y = 0 for  all  s.  Then,  immediately  from  (6),  (7) 
s s 

and  (4), 


which  is  the  desired  result. 


7.3  Estimation  of  Gaussian  Processes 

Once  causal  equivalence  has  been  established  for  linear  least- 
squares  estimation,  the  corresponding  result  for  the  case  where  y and  w 
are  jointly  Gaussian  follows  readily  by  arguments  such  as  those  in 
Section  III  of  [26].  The  proof  uses  the  fact  that,  for  y Gaussian, 


t t 

f l^sj  ^ **  ® ^ ** 

o'*  0 


to  provide  a bridge  between  the  two  cases. 

As  mentioned  above,  a paper  presenting  these  arguments  in  detail  is 


in  preparation. 


8.1  Introduction 


Our  long-term  objective  in  this  recently-begun  research  effort  is 
to  develop  quantitative  measures  of  controllability  and  observability, 
and  to  use  these  as  a basis  for  providing  an  analytical  framework  for 
design  and  performance  evaluation,  especially  for  large-scale  or  decen- 
tralized systems. 

The  large  and  highly-developed  body  of  knowledge  concerning  the 
structural  properties  of  linear  systems  is  framed  almost  entirely  in 
terms  of  "yes"  or  "no"  questions  and  answers;  a state  is  either  reach- 
able or  it  is  not,  an  input  disturbance  is  either  localized  away  from  an 
output  or  it  is  not,  a system  is  decoupled  or  it  is  not;  in  each  case, 
available  characterizations  afford  conditions  that  can  be  checked  to  de- 
termine which  of  the  two  bolds  true.  Almost  all  of  these  involve, 
directly  or  indirectly,  the  controllability  and  observability  properties 
of  the  system.  There  is,  however,  no  body  of  knowledge  relating  to  the 
approximate  achievement  of  these  goals,  or,  more  generally,  of  the  de- 
gree to  which  they  are  achieved.  For  many  practical  purposes  it  is  suf- 
ficient if,  for  example,  an  input  has  an  acceptably  small  influence  on 
an  output,  and  it  is  not  necessary  for  this  influence  to  be  zero. 
Especially  in  large  systems,  some  measure  of  the  degree  of  interaction 
or  noninteraction  between  subsystems  seems  essential  for  analyzing  the 
system,  for  designing  decentralized  estimation  or  compensation  schemes, 
and  for  assessing  the  performance  of  these  estimators  and  controllers 
(in  terms,  say,  of  performance  bounds). 
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We  present  here  some  preliminary  results  from  our  initial  investi- 
gation of  these  questions.  For  simplicity  of  presentation,  we  concen- 
trate on  discrete-time  constant  systems,  with  occasional  references  to 
the  continuous-time  or  time-varying  counterparts  of  these.  The  next 
section  introduces  the  measures  of  controllability  and  observability  we 
have  been  working  with  to  this  point,  while  Section  8.3  presents  some 
implications  of  these  in  measuring  interaction  between  input  and  output, 
and  in  providing  lower  bounds  for  smoothing  problems,  in  order  to  pro- 
vide simple  illustration  of  some  of  the  consequences  of  these  control- 
lability and  observability  measures. 


I 


I 

i. 


8.2  Measures  of  Reachability  and  Observability 
Consider  the  constant,  discrete  time  system 


*k+l  = ^ ®“k 

Ft  = Cx, 


(la) 


with  Xj^  6 R , Uj^  € and  yj^  e R . As  in  [27],  we  apply  the  input  u^^ 

over  [-n,-l],  starting  with  x_  = 0,  and  observe  the  output  y.  over 

“n  K 

[0,n-l].  Letting 


u*  = Ki  ...  u:j, 

r = ivo  yi  •••  y;-ii  • 

F = (B,  AB a""^B]  , 


(2) 


H = 


C 

CA 


ca""^ 


we  then  have  that  the  state  at  time  0 is  given  by 

= F u (3) 

while  the  output  over  [0,n-l]  is,  in  terms  of  x^, 


jr  = HXq 

or,  in  terms  of  u, 

Y = HF  u ^ G u 

where 

CB  CAB  ...  CA'^'^B 

CAB  CA^B 

• • 

CA’^'^B  . . . CA^""^B 


(4) 


(5) 


(6) 


is  the  Hankel  matrix  associated  with  the  system. 

One  natural  way  to  measure  the  reachability  of  a given  state  x at 
time  0 is  as  the  maximum  value  of  the  inner  product  x'x^  over  all  x^  of 
the  form  (3)  with  llull^  = u'u  <1,  i.e.. 


r(x)  = sup  x'Fu  = ylx ' FF ' X 
u'u  < 1 


where  the  second  equality  follows  as  a result  of  performing  the  simple 
maximization.  If  x has  norm  1,  this  reduces  to  maximizing  the  projec- 
tion along  X of  all  states  reachable  with  at  most  one  unit  of  input 


energy. 
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Iff  r(x)  = 0 then  x is  unreachable  in  the  standard  sense  of  the 
term;  i.e.,  orthogonal  to  the  set  of  reachable  states,  so  the  inner  pro- 
duct in  (7)  is  identically  zero.  Large  values  of  r(x)  correspond  in 
some  sense  to  states  that  are  easily  reached,  though  not  in  the  classic 
sense  of  the  term,  since  such  states  may  have  unreachable  components. 

It  is  easily  seen  that  r(x)  is  a sublinear  function  of  x.  It  is 
also  convex,  so  that,  in  particular,  the  set  of  states  whose  reachabil- 
ity measure  is  no  greater  than  a,  i.e.,  {x:  r(x)  < a}^  is  a convex  set 
for  all  a > 0. 

The  corresponding  dual  observability  measure  of  a state  x at  time 
0 is  simply  the  norm  of  the  output  sequence  y that  is  produced  by  x, 
i.e., 

o(x)  = ||jrjj  = II  Hx  jj  =‘^x'H'Hx  (8) 

i . We  note  that  o(x)  is  zero  iff  the  state  x is  unobservable  in  the  stan- 

dard sense  of  the  term.  Large  values  of  o(x)  mean  that  x gives  rise  to 
an  output  sequence  y with  large  norm,  and  in  this  sense  x is  highly  ob- 
servable. The  observability  measure  o(x)  is  sublinear  and  convex;  the 
set  of  all  states  whose  observability  measure  is  less  than  a given  P is 
convex;  i.e.,  {x:  o(x)  < p}  is  convex. 

These  measures,  r(x)  and  o(x),  preserve  suitable  duality  condi- 
tions. It  is  easily  checked  that  the  reachability  measure,  r*^(x) , of 
the  system  dual  to  (1)  is  simply  the  observability  measure  o(x)  of  (1). 
Conversely,  the  observability  measure,  o*^(x),  of  the  system  dual  to  (1) 
is  the  reachability  measure  of  (1). 


r- 

i 

i . 


ii 
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It  is  convenient,  in  order  to  avoid  the  square  roots  in  (7)  and 
(8),  to  work  with  the  squares 

R(x)  = r^(x)  , 0(x)  = o^(x)  (9) 

Another  reachability  measure  is  the  conjugate  functional  of  R.  The 
conjugate  functional  of  a convex  functional  R is  defined  by  [see,  e.g.  , 
28], 

R (z)  = sup  [z'x  - R(x)]  (10) 

X 

Performing  the  indicated  maximization  with  R(x)  given  by  (7)  and  (9), 
we  find 

r’^Cz)  = z’(FF’)"^z  , (11) 

•ff 

assuming  that  FF'  is  invertible;  if  it  is  not,  R (z)  is  still  defined  by 
(10)  and  takes  the  value  » if  z lies  in  the  nullspace  of  F' . For  sim- 
plicity of  notation,  we  assume  here  that  FF*  is  invertible.  Henceforth, 
we  also  neglect  the  constant  factor  \ in  (11)  and  take  as  our  alterna- 
tive reachability  measure 

R*(z)  = z*  (FF’)"^z  (11*) 

This  reachability  measure  (11*)  has  a simple  interpretation  in  terms  of 
the  system  (1):  it  is  the  minimum  amount  of  control  energy  u'u  needed 
to  reach  state  z at  time  0 from  state  0 at  time  -n,  i.e.  min  {u*u:Fu=z] . 
This  minimum  energy  is  known  from  standard  least-squares  linear  system 
theory  to  be  just 
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u°'u°  = R*(z)  = z'(FF')"^z  . 


Thus  the  measure  R given  by  (11')  is  in  fact  a measure  of  unreach- 

ability:  small  values  of  R correspond  to  small  values  of  required  input 

'ic 

energy,  and  thus  to  relatively  easily  reached  states;  large  values  of  R 
correspond  to  large  required  input  energies  and  thus  to  states  that  are 
more  difficult  to  reach;  in  particular,  states  in  the  nullspace  of  F' 
(and  thus  unreachable  according  to  the  "standard"  definition)  have 

A K 

R (z)  = «,  We  observe  that  R is  convex,  as  are  all  conjugate  function- 
als of  convex  functionals. 

The  corresponding  conjugate  functional  of  0 is,  again  neglecting 
the  constant  factor  of  ^ as  in  (11'), 


0*(z)  = z'(H'H)‘^z 


An  interpretation  of  this  in  terms  of  the  system  (1)  follows  by  consi- 
dering the  problem  of  minimizing  the  norm  of  the  linear  functional  on 
y that  produces  the  projection  (or,  more  generally,  the  inner  product) 
of  the  initial  state  x on  the  vector  z.  Suppose  x'z  is  found  as  the 
linear  functional  w'y;  the  vector  w of  minimum  norm  that  accomplishes 
this  is  w®  = H(H'H)  ^z,  so  that  w®'w®  = z'(H'H)  ^z  = 0 (z).  We  thus  see 

it 

that  0 gives,  in  fact,  a measure  of  unobservability:  Small  values  of 

* 

0 (z)  correspond  to  a small  effort  to  determine  x'z  and  thus  to  a "more 

o 

observable"  z than  do  large  values  of  0 (z) . Note  that  0 (z)  = <»  if  z 
is  in  the  nullspace  of  H,  corresponding  to  a state  that  is  unobservable 

• L * 

in  the  standard  sense  of  the  term.  As  with  all  our  measures,  0 is 


convex . 
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The  measures  0 and  R also  preserve  appropriate  duality  relations: 

the  observability  measure  of  the  dual  to  system  (1)  is  the  reach- 

☆ 

ability  measure  R of  system  (1),  and  vice  versa. 

As  a means  of  making  more  concrete  the  relationship  between  R and 

* 

R , observe  that  if  FF'  is  diagonalized  by  the  similarity  transformation 
T,  so  that  T'FF'T  = A,  then 

n 

R(x)  = x’FF’x  = (T'x)'A(T'x)  = A..(T'x)T 

i=l  ^ ^ 

whereas 

R*(x)  = x’(FF')“^x  = (T’x)V^(T'x)  = ^ XT^T’x)^ 

i=l  ^ ^ 

This  makes  clearer  the  earlier  observation  that  R is  a measure  of  reach- 
ability,  whereas  R is  a measure  of  unreachability. 

Finally,  we  note  that  the  continuous- time  analog  of 

n 

FF'  = 12 

i=l 

is  the  "controllability  Grammian" 


8.3  Some  Implications 

We  present  here  two  simple  implications  of  the  above  measures  of 
controllability  and  observability.  The  first  of  these  concerns  the 
interaction  between  an  input  and  an  output  of  a system  of  the  form  (1). 
We  emphasize  that,  in  this  context,  u in  (1)  may  be  simply  a part  of  the 


I 


I 


I 


total  input  and  y in  (1)  may  be  simply  part  of  the  total  system  output, 
so  that  we  are  thinking  in  terms  of  possible  applications  to  disturbance 
rejection  or  decoupling.  We  adopt  as  a measure  of  interaction  between 
input  and  output  the  maximum  norm  of  the  output  sequence  y that  can  be 
produced  by  an  input  sequence  u with  u'u  < 1,  i.e. 


max 


ix  •• 


X = 


Hx  ,x 
o*  o 


max 

u|l  < 1 


= Fu,  u'u  < 1} 

|hfu||  = 1hf|| 


(Recall  that  HF  is  the  Hankel  matrix  of  the  system  (1).)  A number  of 
equivalent  expressions  for  this  can  be  given  in  terms  of  the  above  mea- 
sures; each,  upon  reflection,  has  its  own  intuitive  interpretation  and 
provides  some  insight  into  the  underlying  measures.  For  example 

IIhfI^  = max  u'F'H'HFu  = max  x'H'Hx  = max  0(x) 
u'u  < 1 u'u  < 1 R*(x)  < 1 

x=Fu 


where  the  last  equality  follows  after  a few  lines  of  algebra  and  the  use 

* 

of  the  definition  of  R . One  intuitive  interpretation  of  this  expres- 
sion is  that  large  interaction  between  input  and  output  is  a consequence 
of  at  least  some  states  whose  observability  measure  0(x)  is  large  also 
being  reasonably  reachable  (in  the  sense  that  the  unreachability  measure 

•ff 

R (x)  is  no  greater  than  1).  If  all  states  having  high  observability, 
as  measured  by  0(x),  also  have  low  reachability  (as  measured  by  a high 
unreachability  R (x)),  then  input-output  interaction  will  be  small. 
That  interaction  between  input  and  output  should  depend  on  the  reach- 
ability and  observability  of  the  states  is  to  be  expected.  What  is  im- 


1 . 
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portant  about  the  above  expression  for  |HFjp  is  that  it  quantifies  this 
dependence  in  terms  of  specified  measures  of  reachability  and  observa- 


bility. 


The  corresponding  dual  expression  is 


HF  = max  R(x) 
” 0*(x)  < 1 


and  this  has  a dual  interpretation  to  that  above. 

Taking  a Lagrange  multiplier  approach  to  performing  the  maximiza- 
tion in  (13)  yields,  after  a few  lines  of  algebra, 

|hfP  = sup  {|J:  FF'  + y is  singular} 

while  the  corrsponding  dual  expression  is 

|HFp  = sup  {v:  H'H  + V (FF')"^  is  singular} 

In  either  case,  the  eigenvector  corresponding  to  the  zero  eigenvalue  so 
created  might  be  thought  of  as  the  state  through  which  maximum  input- 
output  interaction  takes  place. 

An  alternative  expression  for  |hf|^  can  be  given  in  terms  of  the 
measures  o and  r.  It  can  be  shown  after  some  algebra  that 


IhfIP  = max  {o(x)  r(x)} 


Again,  this  provides  the  interpretation  that  low  input-output  interac- 
tion requires  easily- reached  states  (in  the  sense  of  having  large  r(x)) 
to  have  low  observability  (in  the  sense  of  small  o(x))  and  easily- 


observed  states  to  have  low  reachability.  The  quantification  of  this 
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expected  intuition,  and  the  measures  involved,  are  quite  different  from 
those  given  earlier. 

We  conclude  this  section  with  the  second  of  our  two  simple  applica- 
tions of  the  measures  of  controllability  and  observability,  in  this  case 
to  providing  bounds  on  the  error  variance  in  smoothing  problems.  Con- 
sider the  discrete-time  stochastic  system 


’^k+l  = ^ 


Yv  = Cx.  + w. 


where  v and  w are  independent  white  noise  sequences,  and  the  initial 
state  is  taken  to  be  unknown.  Let  the  error  covariance  in  the  Gauss- 

Markov  (smoothing)  estimate  of  x^  given  y^, yn-1 

easily  shown  that 

< H'H  , Z > 


where  the  matrix  inequalities  denote  the  usual  partial  ordering:  P > Q 
iff  P-Q  is  nonnegative  definite.  The  above  bounds  on  Z and  Z ^ are,  in 
fact,  simply  the  Cramer-Rao  bound  for  this  problem. 

In  particular, 

Xq  Xq  < 0(Xq)  , 

and 

z'  Z z > 0 (z)  ; 
i' 

0 (z)  thus  provides  (if  z is  a unit  vector)  a bound  on  the  error  vari- 
ance  in  the  direction  z:  the  smaller  is  0 (z) , and  thus  the  more  ob- 


i 

H 

I 

I 


• i 


T 
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servable  is  z,  the  smaller  is  the  lower  bound  on  the  error  variance  in 
this  direction. 

These  two  simple  examples  are  intended  merely  to  illustrate  the 
kind  of  results  that  follow  from  an  analysis  in  terms  of  quantitative 
measures  of  controllability  and  observability.  Research  in  this  area  is 
continuing. 
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9 . SUMMARY 

Two  of  the  research  topics  included  in  this  report  are  concerned 
with  estimation,  decision,  and  control  problems  for  observation  models 
other  than  the  familiar  "signal  in  additive  white  (Gaussian)  noise"  one. 
Both  involve  observations  of  a doubly-stochastic  point  process.  In  one 
problem  we  derive  and  examine  the  performance  of  optimal  and  suboptimal 
estimation  and  tracking  systems  when  available  observations  include  a 
space-time  point  process;  the  optimal  estimators  and  controllers  are 
shown  to  be  nonlinear  but  finite-dimensional  (and  therefore  implement- 
able),  and  their  performance  is  analyzed  in  terms  of  upper  and  lower 
bounds,  the  upper  bounds  giving  the  performance  of  suboptimal  schemes 
that  are  even  more-easily  implemented  than  the  corresponding  optimum. 
In  the  second  problem,  we  derive  optimal  modulation  and  demodulation 
systems  for  coded,  direct-detection  optical  communication  systems  under 
various  conditions  on  the  average  energy  and  peak  amplitude  of  the 
transmitted  optical  signal;  here  the  received  data  is  a point-process 
whose  intensity  is  signal-development.  A modulation  scheme  we  show  to 
be  optimum  when  average  energy  constraints  are  a limiting  factor  is, 
in  fact,  the  one  employed  in  a one-gigabit-per-second  satellite  optical 
communication  system  currently  under  development. 

Algorithms  have  been  derived  that  enable  each  node  in  a network  to 
compute  its  shortest  distance  to  any  other  node  using  only  local  topo- 
logical information  and  decentralized  information  transfer  between  ad- 
jacent nodes.  Shortest  path  algorithms  with  such  decentralized  informa- 
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tion  requirements  are  of  obvious  importance  in  many  applications,  in- 
3 

eluding  C -systems.  A number  of  modifications  of  our  basic  algorithm 
have  been  derived,  all  retaining  the  basic  informationally-decentralized 
characteristics,  but  each  with  its  own  advantages  and  limitations  in 
handling  various  topological  changes  in  the  network. 

Singular  estimation  and  control  problems  have  been  examined  from  a 
geometric  viewpoint,  and  various  subspaces  that  are  fundamental  to  the 
geometric  approach  to  system  theory  are  shown  to  provide  simple,  concise 
solutions  to  the  singular  estimation  and  control  problems.  These  solu- 
tions are  the  same  for  multi-input,  multi-output  systems  as  for  single- 
input, single-output  ones;  the  geometric  solution  reduces  directly  to 
well-known  algebraic  solutions  in  the  latter  case. 

Compensator  design  methods  have  been  investigated  for  multivariable 
systems  represented  in  polynomial  matrix  form.  Recent  years  have  seen  a 
reawakening  of  interest  in  frequency-domain  design  techniques,  in  con- 
trast to  the  time-domain  methods  that  have  predominated  for  the  past  two 
decades,  and  these  have  heen  based  principally  on  polynomial  matrix  sys- 
tem descriptions.  The  theory  underlying  our  design  procedures  draws  on 
the  ideas  and  results  of  modern  algebra,  especially  multilinear  algebra. 

A new,  direct  proof  has  been  derived  of  the  known  causal  equiva- 
lence of  the  innovations  and  observations  processes  for  linear  estima- 
tion and  for  Gaussian  processes. 

Finally,  we  have  presented  some  preliminary  results  from  our  re- 
cently-begun research  effort  to  develop  quantitative  measures  of  con- 
trollability and  observability  that  have  consequences  in  terms  of  mea- 
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suring  such  properties  as  the  degree  of  interaction  or  noninteraction 
between  input  and  output  and  in  deriving  bounds  on  estimator  or  con- 
troller performance.  Of  particular  interest  in  the  longer  term  are 
applications  to  large,  decentralized  system  problems. 
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Summary — The  exact  solution  is  derived  for  a stochastic 
optimal  control  problem  involving  a linear  stochastic  plant, 
quadratic  costs,  and  nonlinear,  nongaussian  observations. 
The  observations  are  in  the  form  of  a point  process  in  which 
each  point  has  both  a temporal  and  a spatial  coordinate.  The 
state  of  the  stochastic  plant  influences  the  intensity  of  the 
observed  time-space  point  process.  The  solution  to  this  dual 
control  problem  can  be  realized  with  a separated  estimator- 
controller  in  which  the  estimator  is  nonlinear,  mean-square 
optimal,  and  finite  dimensional,  and  the  controller  is  the 
certainty  equivalent  linear  controller.  Motivation  for  the 
stochastic  optimal  control  problem  studied  here  is  given  in 
terms  of  position  sensing  and  tracking  for  quantum-limited 
optical  communication  problems. 

I . Introduction 

The  most  general  stochastic  optimal-control  problem  is  a 
so-called  dual  control  problem  which  has  been  solved  only 
under  very  restrictive  conditions.  Of  special  importance  is 
the  separation  theorem  which  demonstrates  that  for  a linear 
stochastic  plant,  quadratic  costs,  and  linear  observations  in 
additive  Gaussian  noise,  the  optimal  control  law  can  be 
determined  by  solving  separately  and  independently  a causal 
stochastic  estimation  problem  and  a deterministic  control 
problem.  In  this  paper,  we  demonstrate  that  a similar 
separation  holds  for  the  exact  solution  to  a dual  control 
problem  involving  a linear  stochastic  plant,  quadratic  costs, 
and  nonlinear,  nongaussian  observations.  The  observations 
are  in  the  form  of  a point  process  in  which  each  point  has 
both  a temporal  and  a spatial  coordinate.  The  state  of  the 
stochastic  plant  influences  the  intensity  of  the  observed 
time-space  point  process.  We  show  that  the  solution  to  this 
dual  control  problem  can  be  realized  with  a separated 
estimator-controller  in  which  the  estimator  is  nonlinear, 
mean-square  optimal,  and  finite-dimensional,  and  the  control- 
ler is  the  certainty-equivalent  linear  controller.  Motivation 
for  the  dual  control  problem  is  given  in  terms  of  optical 
position  sensing  and  tracking. 
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The  point-process  observation-model  we  adopt  here  is  a 
generalization  of  that  in  [1]  to  include  feedback  interactions 
between  the  observed  points  and  the  state  of  the  linear 
stochastic  plant.  This  interaction  implies  that  the  plant  state  is 
not  a Gaussian  process.  Even  so,  we  find  as  in  (11  that  at  any 
time  the  state  is  conditionally  Gaussian  given  observations  of 
the  time-space  point  process  up  to  that  time. 

2.  Model  and  problem  statement 

We  adopt  the  following  model.  Denote  by  (fo.«>)  and  R“  a 
semi-infinite  time  interval  and  an  m -dimensional  Euclidean 
space,  respectively.  We  consider  as  observations  a point 
process  on  [to,  “JxR";  thus,  each  observed  point  is 
identified  by  a temporal  coordinate  t € (to,  ®)  and  a spatial 
coordinate  re  R".  Let  T and  A be  Borel  sets  in  [f,,®)  and 
R~,  respectively,  and  denote  by  N(TxA)  the  number  of 
points  occurring  in  T x A.  We  define  N(l)  = N([f „,  I ) x R - ) 
as  the  number  of  points  up  to  but  not  including  time  t 
regardless  of  their  spatial  location.  We  use  X,  to  denote  the 
sequence  of  points  up  to  time  t ; X,  consists  of  the  number 
N(t)  and  time-space  coordinates  (t„r,),  (tj,  rj..., 
(tN(f),  Tsui  ) of  all  points  in  [to,  t)x  R”. 

We  assume  that 

lim  lrp")-'Pr{N{lt,  t + t)x  c(r,  p))  = l|A;.x(<r);o' a U 

= lim  (Tp"r'Rr{N((r,t+T)xc(r,p))2;l|,Y..i(<r);<za:l,) 

= A(f ) exp  {-  [[r  - H(t  )x  (I ))'  R - '(f)[r  - H(l  )x  (t )!}.  (1) 

where:  c(r,p)  = [r., r, -t-pjx • • • x [z., z, -t-p)  is  a cube  in 
R" ; A(f)  is  a known  function  of  /;  H(t)  is  a known,  m x n 
matrix-valued  function  of  (;  R(f)  is  a known,  symmetric, 
positive-definite,  m xm  matrix-valued  function  of  f ; and 
{x(r);  t z r,}  is  the  a -dimensional  state  of  a linear  stochastic 
plant  as  defined  below.  Thus,  the  conditional  probability  that 
a single  point  will  be  observed  in  a small  time-space  volume 
[I.  I + t)  X c(r,  p)  given  X,  and  {x(a’);  <z  z t,}  is  approximated 
to  order  rp"  by  A(f,  z,  xftHzp",  where  we  define 

A (r,  z.  X (f ))  = A(t ) exp  {-  Hr  - //(t  )x  (/ )!’  R •(» ) 
xtr-//(r)x(»)]}. 

We  assume  that  the  process  {x(();refo}  is  defined  for 
t z by  thefollowinglinearstochastic  differential  equation 

dx(r)  = F(t)x(t)dt  + G(t)u(t)<lt  + V(r)di'(t).  x(t,)  = x,. 

(2) 

where:  F(f),  0(1),  and  V(l)  are  known  n x «.  nxk,  and 
nxf  matrix-valued  functions  of  (,  respectively;  {u(f):  ( er  f(} 
is  a k-dimensional  control  input;  {»(();  t z ij  is  a standard, 
d-dimensional  Wiener  process  such  that  for  any  time  t z t^ 
the  future  (v(a-);  (z  2 ()  of  t>  is  independent  of  X, ; and  the 
random  initial  state,  x.,  is  assumed  to  be  normal  with 
mean-value  vector  x,  and  covariance  matrix  X,. 
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Consider  now  the  stochastic  optimal  control  problem 
involving  the  linear  dynamic  system  (2)  and  the  average 
quadratic  cost  functional 


J[u]=E^j  luV)PO)uU)  + x'U)Q(l)x(l)]dt 
+ x'(T)S;t(T)|. 


where  P,  Q,  and  S are  given  matrices  such  that  P(t)  is 
positive  definite  for  I £ [to.  TJ.  Q(l)  is  non-negative  definite 
for  r £ [to.  TJ.  and  S is  non-negative  definite.  Attention  is 
restricted  to  the  so-called  classical  information  pattern  in 
which  the  control  input  u(t)  at  each  time  t £ [to.  T]  depends 
on  the  observations  A",  up  to  that  time.  In  other  words,  we 
consider  control  laws  (*(•,•)  that  map  pairs  of  the  form  (JT,.  I) 
into  u(t)^  n(X,.t).  In  view  of  this,  the  symbol  u(t)  will 
henceforth  denote  the  control  law  fi  evaluated  at  {Jf,.  t).  In 
order  to  emphasize  that  the  cost  functional  (3)  therefore 
depends  on  the  choice  of  control  law  ii  we  shall  write  /[rt] 
instead  of  J[u  ].  We  seek  the  control  law  iio  that  minimizes 
/[Mi- 
lt is  well-known  that  when  observations  of  x have  the 
linear,  additive  form 


dz(0-H(f)x(r)dr+dH-(r), 


where  w is  a standard  Wiener  process,  the  control  law  that 
minimizes  J[/il  is  defined  by  Uo(f)=  - P''{t)C'(t)K{t)x(t), 
where  i(t)  is  the  causal  minimum  mean-square-error 
estimate  of  x(t)  in  terms  of  past  data  {z(o');  to^a-  < t}.  and 
K(t)  is  the  precomputable  solution  to  the  matrix  Riccati 
equation 


dKUJ/dt  = - K(f)F{l)-  F'(t)K(l) 

+ K(l)C(l)P-‘(t)G'(t)K(l)  - Q(t) 


with  the  final  condition  K{T)  = S.  This  result  is  usually  called 
the  separation  theorem  because  it  shows  that  the  solution  to 
this  special  version  of  the  stochastic  control  problem  can  be 
obtained  by  solving  separately  and  independently  a least- 
squares  control  problem  and  a least-squares  estimation 
problem.  In  the  next  section,  we  shall  show  that  an  analogous 
separation  holds  when  the  observations  are  in  the  form  of  the 
time-space  point  process  defined  above.  We  note  in  this 
instance  that  neither  the  plant  state  nor  the  observations  are 
normally  distributed. 

We  will  need  the  following  lemma,  which  can  be  proven  in 
exactly  the  same  manner  as  Lemma  1 in  [1]. 

Lemma  I.  Denote  by  Pi(X|,^,)  the  conditional  probability 
density  of  x{t)  given  Jf,  for  t a /<>.  Then 


dp,(X|jV',)  = L[P,(X|X)ld/ 

+ P.(X|X,)  f [A(f,  r.  X)  - A((,  r)]A'-'(f,  r)N(df  x dr). 


where  we  define 


A(f,r)  = £[A(t,r,x(r))|X;) 


and  where  L[(-)]  is  the  following  partial-differential  operator 


/.[(■)l  = -2^I^^  + Gu],()/ax 


3’lvv]o(  )i3X,ax, 

* f-l l-I 


3.  Estimator -controller  solution 
The  optimal  control  law  p,  that  minimizes  the  average 
quadratic  cost  Jip]  when  the  observations  are  of  the 
nonlinear,  nongaussian  type  described  above  is  given  in  the 
following  proptisition. 

Proposition  I.  Under  the  above  assumptions,  the  optimal 


control  law  po  that  minimizes  Jlp ) is  defined  by 


uM=-P  '(.t)G\t)Kit)xU). 


where  K(t)  satisfies  (5),  and  x(/)  = E[x(r)|X,)  satisfies 


di(r)  = F(i).f(r)dr  i- C(r)uu(r)d/ 


+ J_  M{t)[r-H(t)i(f)lN(dl  Xdr),  i(t„)=jeo. 
dt{r)  = F(t)i(r)dr-i-  l(r)F'(/)dr V(f)V'(i)dr 
- M(t)H(t)i{t)N(dt  X dr);  t(/.)  = X,, 

M(f ) = S(r )//'(/ )1W(/ )t(f )H'(/)  + R(f )J  '. 


Furthermore,  x(t)  is  conditionally  Gaussian  with  mean  i(r) 
and  covariance  l(t)  given  X,. 

Proof  of  Proposition  I.  According  to  Astr6m[2],  the  cost 
function  Jlp]  can  be  rewritten  as 


/(m]=  f F{J|u(r)-fA(r)x(f)f„„}df-l-£{xiK(Ux.> 

^*0 

+ f tr[B(f)£(t(t))-f  V(l)V'{r)X(f)]d/ 

JtQ 


where  x(t)  = £[x(t)|X,]  is  the  causal  minimum  mean 
square-error  estimate  of  x(r)  given  X„i(f ) is  the  correspond- 
ing conditional  error  covariance  given  X„  A(f)  = 
P-'(»)G'(t)K(/),B(r)  = A'(t)P(f)A(f),|el,’  = i>'Feforany 
vector  B,  and  tr  [■)  denotes  trace.  It  is  evident  by  virtue  of  the 
non-negativity  of  the  first  term  on  the  right  in  this  expression 
that  the  optimal  control  law  po  is  defined  by  udt)  in  (7) 
provided  that  ^(r),  and  so  Eli(/)].  is  independent  of  the 
choice  of  control  law.  We  now  demonstrate  this  indepen- 
dence by  arguing  first,  that  for  any  causal  control  law,  x(t)  is 
conditionally  normal  given  X,.  This  can  be  verified  by  using 
(6)  and  paralleling  the  proof  by  induction  of  Proposition  I in 
[I];  it  is  found  for  a control  law  defined  by  u(r)-  p(X,.t). 
that  x(t)  is  conditionally  normal  given  X,  with  a conditional 
mean  and  covariance  which  satisfy  (8)-(10)  with  Uo(t) 
replaced  by  u(t).  It  follows  as  a special  case  that  these 
assertions  hold  for  the  particular  choice  u(t)  = Uo(f).  Now, 
examination  of  (9)  shows  that  the  only  way  t(t)  can  be 
influenced  by  the  choice  of  control  law  is  through  the  point 
process 


JV(dtxdr). 


However,  it  is  evident  (see  [3]  or  [4]  for  a proof)  that  {M(t); 
t ^ to}  is  a point  process  with  rate  function 


A((,r,x(t))dr  = (2ir)"'’A{t)det''’[R(I)l.  (12) 


As  this  rate  function  is  independent  of  both  X,  and 
{x(<r):  o- a:  to},  it  follows  that  {N(t);t&to}  is  a Poisson 
process  with  a rate  that  is  independent  of  the  choice  of 
control.  Hence,  i(t)  is  independent  of  the  control  law,  and 
Proposition  I is  then  established. 


4.  Application  to  optical  tracking 

Communication  systems  that  employ  a narrow  beam  of 
light  as  a carrier,  star-tracking  systems,  and  infra-red  tracking 
systems  all  have  a requirement  for  position  sensing  and 
active  tracking  to  maintain  optical  boresight  in  the  presence 
of  a variety  of  disturbances  (S).  The  requirements  can  be 
quite  stringent  with  a design  goal  of  a few  microradians  of 
angular  tracking  accuracy  not  being  uncommon.  The  above 
estimator-controller  solution  provides  a possible  tool  for  the 
design  of  an  optical  tracking  system  under  the  following 
idealized  conditions. 

Let  Ht.r)  denote  the  light  intensity  at  time  I G (I., "»)  and 
position  r C df  of  an  optical  field  incident  on  the  photoemis- 
sive  surface  of  a two-dimensional  pholodetector  on  boresight 
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anJ  without  any  motions.  Here,  dt  is  a subregion  of  R’ 
corresponding  to  the  photoemissive  surface.  We  assume  a 
Gaussian  intensity  profile 

/(l.r)=  fo(l)exp{-Jr'R  '(f)r>. 

Vibration,  beam  steering  due  to  propagation  of  the  light  beam 
through  atmospheric  turbulence,  and  other  effects  cause  the 
spot  of  light  on  the  photoemissive  surface  to  move  about  in  a 
random  fashion.  In  this  case,  the  intensity  profile  becomes 

/(t.ry.ft))'^  l.(f)exp{-l|r-y.(i)rR  '(f  Mr  - y.  (»)]}, 

where  y_(f)  models  the  random  motions.  We  assume  that 
{>.(();  I ^ t,}  is  derived  from  a Gaussian  diffusion  satisfying 

de.(r)  = F.(r)r.(r)dr  + V.(/)de.(t), 
y.(l)  = H,(l)Jt.(f), 

where  (v.(t);  ts:t<}  is  a standard  Wiener  process.  The 
purpose  of  the  tracking  controller  is  to  compensate  for  these 
random  motions  in  order  to  maintain  optical  boresight.  Thus, 
in  the  presence  of  a controller  to  position  telescopes, 
mirrors,  or  other  pointing  devices,  the  intensity  becomes 

/(t.r,y,(t),y,(t)) 

= /.(f)  exp  {-  Hr  - y_  (/)  + y.  (f )]  R -(/)[r  - y.  (f ) + y,  (f )]} 

where  y,(t)-y_(l)  is  the  tracking  error.  Ideally,  this  error 
should  be  zero,  but  this  cannot  be  accomplished  for  two 
reasons:  the  position  error  y,.(t)  is  unknown  and  must  be 
estimated  from  data  available  at  the  photodetector  output, 
and  the  tracking  devices  will  have  some  inertia  so  that  y.(() 
cannot  be  tracked  instantaneously  even  if  it  were  known.  We 
model  the  tracking  devices  by  a linear  stochastic  plant 

dx,(f)=  F,(f)x,(f)df + G,(f)H(f)d/4  V,(t)dv,(/) 
y,(/)  = //,(!  )jt.(f), 

where  u(t)  is  the  input  to  the  tracking  devices  from  the 
tracking  controller,  and  (i>,(f);f  2f,}  is  a standard  Wiener 
process  modeling  local  disturbances  such  as  those  due  to 
vibration. 

Photoelectron  conversions  take  place  in  the  photoemis- 
sive surface  at  a rate  proportional  to  the  incident  light 
intensity[6].  Thus,  the  photoelectron  conversion  rate  has  the 
form  of  A(f.r,x(/))  for  (f,  r)  e [f.,  <*>)  x 38  with  A(f)  = 
where  ij  is  the  quantum  efficiency  of  the 


photuemitter,  h is  Planck's  constant,  and  y is  the  optical 
frequency.  Here,  x is  the  vector  obtained  by  adjoining  x„  and 
X.,  and  ff  is  obtained  from  //„  and  II,  in  an  obvious  way. 

The  problem  of  optical  tracking  is  to  follow  the  position  of 
maximum  light  intensity  at  time  f in  terms  of  photoelectron 
conversions  observed  on  (f.,  f ) x Except  for  the  finiteness 
of  3!,  this  problem  is  identical  to  the  control  problem  studied 
above  when  photoelectron  conversions  are  identified  as 
points.  An  approximation  that  appears  reasonable  when  the 
beam  is  small  and  the  tracking  errors  are  small  (i.e.  fine 
tracking  mode  rather  than  an  acquisition  mode)  compared  to 
the  size  of  the  photoemissive  surface  is  to  replace  by  38'. 
With  this  approximation,  the  optical  tracking  problem  is 
solved  by  the  result  in  Proposition  I. 

5.  Conclusion 

The  solution  has  been  given  for  a stochastic  control 
problem  involving  observations  of  a time-space  point 
process.  The  solution  is  in  terms  of  a separated  estimator- 
controller  in  which  the  estimator  is  nonlinear  but  closely 
related  to  a linear,  discrete-time  Kalman-Bucy  filter,  and  the 
controller  is  the  certainty  equivalent  linear  controller.  The 
estimation  performance,  F[Z(r)],  and  control  performance 
corresponding  to  this  have  not  been  given  and,  indeed,  appear 
extremely  difficult  to  evaluate  exactly.  We  have  established 
lower  and  upper  bounds  on  these  performances  which  will  be 
given  in  another  paper  [7]. 
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ABSTRACT 

Estimation  and  control  problems  are  examined  for  a class  of  models 
Involving  a linear  system,  a quadratic  cost,  and  observations  that  include 
a space-time  point  process  as  v/ell  as  the  familiar  "signal  in  additive 
Wiener  process"  measurements.  Motivation  for  this  class  of  models  is  given 
in  terms  of  position  sensing  and  tracking  for  quantum-limited  optical 
communication  problems.  These  models  Include  as  special  cases  several 
simpler  ones  considered  previously.  As  in  the  simpler  cases,  the  optimum 
estimator  is  finite-dimensional  and  nonlinear,  and  the  optimum  controller 
separates  into  the  optimum  estimator  followed  by  the  certainty-equivalent 
control  law. 

Although  the  optimum  estimator  and  the  optimum  controller  are  finite- 
dimensional, the  corresponding  expected  error  covariance  and  optimum  cost 
require  infinite-dimensional  calculations.  This  motivates  the  derivation 
of  easily-computed  upper  and  lower  bounds  on  estimator  and  controller 
performance.  The  upper  bounds  are  derived  by  evaluating  exactly  the 
performance  of  a parametrized  family  of  suboptimum  designs;  one  of  these 
is  identified  as  having  smaller  performance  than  any  other,  thus  providing 
a minimal  upper  bound  within  this  family.  The  lower  bounds  are  obtained 
directly  by  calculations  involving  inequalities. 
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Snyder  and  Fishman  [1]  have  considered  the  problem  of  estimating  the 
Gaussian  state  of  a linear  stochastic  system  from  observations  of  a point 
process  In  which  each  point  has  both  a spatial  and  a temporal  co-ordinate. 

The  state  of  the  system  influences  the  spatial  component  of  the  intensity  of 
the  observed  space-time  point  process:  at  any  given  time,  the  contours  of 
constant  spatial  intensity  are  ellipsoids  whose  common  centroid  depends 
linearly  on  the  current  system  state.  The  temporal  component  of  the  in- 
tensity is  assumed  in  [1]  to  be  deterministic.  The  conditional  density  of 
the  system  state  at  any  time  given  the  past  of  the  observation  process  is 
sho;>m  to  be  Gaussian,  and  the  conditional  mean  and  the  conditional  covariance 
satisfy  finite-dimensional  nonlinear  stochastic  differential  equations  that 
are  driven  by  the  observed  space-time  point  process. 
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This  model  has  been  generalized  in  [2],  [3]  to  include  causal  feedback 
Interactions  between  the  observed  point  process  and  the  state  of  the  linear 
stochastic  system.  Although  inclusion  of  a feedback  (control)  term  destroys 
the  Gaussian-ness  of  the  system  state  process,  it  does  not  alter  either  the 
Gaussian  form  of  the  conditional  density  of  the  state  given  past  observa- 
tions or  the  finite-dimensionality  of  the  stochastic  differential  equations 
for  the  conditional  mean  and  the  conditional  covariance.  These  and  related 
properties  underly  the  derivation  of  a separation  theorem  for  a stochastic 
optimal  control  problem  involving  these  system  and  observation  processes  and 
a quadratic  cost  functional.  Motivation  for  this  stochastic  control 
problem  is  given  in  [2],  [3]  in  terms  of  position  sensing  and  tracking  for 
quantum-limited  optical  communication  problems. 

In  this  paper,  we  first  generalize  the  model  of  [2],  [3]  in  two  ways. 

On  the  one  hand,  the  space-time  point  process  observations  are  supplemented 
by  continuous  observations  of  a linear  function  of  the  system  state  in  an 
additive  Wiener  process.  The  optimum  estimator  for  a restricted  version" 
of  this  problem  is  included  in  the  dissertation  [4]  and  a corresponding 
separation  theorem  is  to  be  included  in  a forthcoming  paper  [5].  Here  we 
remove  the  requirement  in  [4],  [5]  that  the  supplementary  observations  have 
the  same  dimensions  as  the  spatial  component  of  the  space-time  point  process. 
On  the  other  hand,  we  allow  the  temporal  component  of  the  intensity  of  the 
observed  space-time  point  process  to  be  itself  a random  process.  Under 
appropriate  independence  assumptions,  it  is  shown  that  the  joint  problem  of 
estimating  the  state  of  the  system  and  the  temporal  intensity  reduces 
to  two  separate  problems,  one  of  which  is  that  considered  in  [2],  [3]  while 
the  other  is  a standard  estimation  problem  for  point  process  observations 
having  no  spatial  component,  as  discussed,  e.g.,  in  [6].  All  properties 
needed  to  extend  the  separation  theorem  for  stochastic  control  problems  are 
retained.  These  two  generalizations  are  discussed  later  in  terms  of  the 
optical  position-sensing  and  tracking  problem  that  motivated  [2],  [3]. 

Second,  we  examine  estimation  and  control  performance  via  upper  and 
lower  bounds.  While  in  all  cases  the  optimum  estimator  and  the  corres- 
ponding conditional  error  covariance  satisfy  finite-dimensional  stochastic 
differential  equations  and  thus  can  be  computed  on-line,  both  depend  on  the 
observed  space-time  point  process  and  cannot  be  precomputed.  Insofar  as  the 
conditional  covariance  is  concerned,  this  contrasts  with  the  precomput- 
ability that  holds  for  the  Kalman  filter.  One  is  therefore  led  to  consider 
the  expectation  of  the  conditional  covariance,  both  as  a natural  measure 
of  estimation  performance  in  its  own  right  and  because  it  happens  to  be  the 
particular  measure  of  estimation  performance  that  determines  the  optimum 
cost  in  the  stochastic  control  problems  considered  here  and  in  [2],  [3]. 
However,  while  the  expectation  of  the  conditional  covariance  is  determin- 
istic and  in  principle  can  be  precalculated,  this  calculation  turns  out  to 
be  infinite-dimensional.  With  this  in  mind,  we  derive  in  Sections  IV  and  V 
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easily-precalciilable  matrix-ordering  upper  and  lower  bounds  on  the  expected 

conditional  covariance.  The  upper  bounds  are  obtained  by  determining  the 

exact  performance  of  each  estimator  in  a parametrized  family  of  suboptimal 

estimators  whose  structure  is  similar  to  that  of  the  optimum  estimator  but 

for  which  the  mean-square  error  is  precomputable.  From  within  this  class, 

we  identify  a particular  suboptimum  estimator  whose  mean-square  error  lies  ' 

at  all  times  below  that  of  any  other  in  the  matrix  ordering  sense.  The 

lower  bound  is  obtained  directly  using  differential  and  other  inequalities. 

II.  FORMULATION  OF  THE  ESTIMATION  AND  CONTROL  PROBLEMS 
Consider  the  stochastic  linear  system 

dx^  = F(t)x^dt  + G(t)u^dt  + V(t)dv^  (la) 

dZj.  = C(t)Xj.dt  + dw^;  = 0 (lb) 

where  the  state  x is  an  n-dimensional  random  vector,  the  control  u^  is  a 
k-dimensional  vector  whose  measurability  is  defined  later,  v and  w are 

Independent  (normalized)  t-  and  q-dimensional  Wiener  processes,  the  random  j 

Initia^l  state  Xq  of  (la)^  is  independent  of  v and  w and  is  Gaussian  with  ] 

mean  x^  and  covariance  and  the  deterministic  uniformly  bounded  matrix- 
valued time  functions  F(*),  G(*),  V(*)  and  C(*)  have  the  appropriate  j 

dimensions.  j 

In  addition  to  observations  of  the  process  z,  there  are  also  available  ] 

observations  of  a space-time  point  process  defined  on  [0,“)  x R as  follows. 

Each  point  occurrence  is  identified  by  a temporal  co-ordinate  t^  [0,“>)  and  , 

a spatial  co-ordinate  rGR  . Let  x and  A be  Borel  sets  in  [0,“>)  and  R , i 

respectively,  and  denote  bj^  N(t  x A)  the  number  of  points  occurring  in  x A. 

We  define  = N([0,t)  x r ) to  be  the  number  of  points  up  to  but  not  j 

including  time  t regardless  of  their  spatial  location;  N^  is  taken  to  be  a 
Poisson  counting  process  with  intensity  p,  where  p is  a stochastic  process  i 

that  is  independent  of  x^,  v and  w,  and  p is  almost-surely  positive.  Given 
that  N has  a jump  at  t (i.e.  N ^ spatial  location  r of  the  point  ! 

is  taken  to  be  an  m-dimensional  Gaussian  random  vector  with  mean  H(t)x^  and  \ 

known  positive  definite  covariance  R(t) , where  H(*)  is  a known  m x n-matrix  j 

valued  time  function.  Given  N and  x for  s ^ 0,  the  spatial  locations  are 
independent  random  vectors  that  are  Independent  of  all  other  random  entities. 

Thus  the  space-time  point  process  can  be  thought  of  as  having  an  intensity  ; 

^^(r.x^.Pj.)  = Pj.Yj.(r,Xj.)  (2) 

i 

that  separates  into  the  product  of  a temporal  component  p that  underlies  j 

the  Poisson  counting  process  N and  a spatial  component 

Yj.(r,Xj.)-N(H(t)Xj.,R(t))  = (2Tr)  [detR(t) ]“'^exp{-%(r-H(t)x^)' R ^(t)  (r-H(t)x^)  } - j 


that  gives  the  density  of  the  spatial  location  r of  the  point  occurring  at  t. 

Let  (J],F,P)  be  the  underlying  probability  space.  We  denote  by  Z the 
the  sub-o-algebra  of  F generated  by  the  process  z over  the  interval  [0,t), 
and  by  W the  sub-o-algebra  generated  by  the  space-time  point  process  over 
[0,t).  Let  8^  = o-algebra  generated  by  and  N . It  is  assumed 

throughout  that  u is  B -measurable  and  such  that  the  solution  to  (la)  is 
well-defined;  such  controls  will  be  henceforth  called  admissible. 

The  estimation  problem  to  which  we  address  ourselves  is  to  find  the 
conditional  means 


A E(Xj.|B^],  Pj.  A (3) 


and  the  corresponding  conditional  covariances 
A cov[x^|8^],  A covtVj.  |Bj.] . 

The  control  problem  we  examine  is  to  find  the  admissible  control 
{u^;  tG[0,T]}  that  minimizes  the  quadratic  cost  functional 


J(uJ  = E{/  [u^P(t)u^  + x^Q(t)Xj.)dt  + x^Sx^} 


(4) 


(5) 


where  the  symmetric  uniformly-bounded  matrix-valued  time  functions  have  the 
appropriate  dimensions  with  Q(t)  and  S non-negative  definite  and  P(t) 
positive  difinite. 

Our  notation  is  generally  as  follows:  lower  case  letters  denote 
vectors,  upper  case  letters  denote  matrices,  and  script  letters  denote 
o-algebras;  v^  denotes  a time-indexed  random  vector,  in  contrast  to  v(t) 
which  denotes  a time-indexed  deterministic  vector;  everything  takes  place 
on  the  fixed,  finite  time  interval  [0,T];  y-N(q,Q)  means  that  y is  Gaussian 
with  mean  q and  covariance  Q;  the  inequality  P^Q  between  symmetric,  non- 
negative  definite  matrices  means  that  Q-P  is  non-negative  definite. 

III.  SOLUTION  OF  THE  OPTIMAL  ESTIMATION  AND  CONTROL  PROBLEiMS 

Theorem  1.  The  conditional  density  of  x^  given  8^  is  Gaussian  and  the 
conditional  mean  and  the  conditional  covariance  satisfy  the  finite- 
dimensional nonlinear  stochastic  differential  equations: 

dXj_  = F(t)x^dt  + G(t)u^dt  + E^C'(t)[dz^  - C(t)x^dt] 

+ “ H(t)x^]N(dt  X dr);  x^  = E[Xq]  (6) 


dZj.  = F(t)E^dt  + Zj.F'(t)dt  + V(t)V'(t)dt  - Z^C' (t)C(t) E^dt 

- M^H(t)E^dNj.;  Eq  = cov[Xq] 


where 


Mt  = Ej.H'(t)[H(t)Ej.H'(t)  + R(t)] 


-1 


(7) 


(8) 


If  cov[Xq]  is  positive  definite  then  Ej.  is  almost-surely  positive  definite 
and  its  inverse  satisfies  the  finite-dimensional  nonlinear  stochastic 
differential  equation 


dE“^  = -Z~^F(t)dt  - F’(t)E~^dt  -E"^V(t) V (t)Z~^dt  + C'(t)C(t)dt 


+ H’(t)R"^(t)H(t)dNj.;  E^^  = (cov[Xq])  ^ 


(9) 


The  conditional  density  of  given  8^  coincides  with  the  conditional 
density  of  pj.  given  the  a-algebra*'T.  generated  by  tne  past  of  the  process 
N^,  l.e.  f^(p|8^)  = f^(p(T^),  assuming  the  control  u^  is  satisfies  a 
technical  property  specified  in  the  proof  and  discussed  Immediately  there- 
after. 
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Proof.  The  derivation  of  (6)  - (9)  we  give  here  parallels  the  proofs  of 
Lemma  1 and  Proposition  1 in  [1]  which  establish  the  corresponding  result 
for  the  special  case  where  H 0,  = P(t)  is  deterministic,  and  C(t)  = 0 

(i.e.  the  observations  z are  not  available).  In  outline,  the  modifications 
that  are  made  to  include  each  of  these  generalizations  are  as  follows:  the 
Introduction  of  a Sj,-measurable  u causes  no  difficulty  since  u^  is  deter- 
ministic in  all  calculations  which  Involve  probability  measures  conditioned 
on  8 ; the  presence  of  nonzero  C(*)  merely  adds  an  additional  term  that  is 
familiar  for  this  "signal  in  Wiener  process"  observation  model;  the 
generalization  to  random  is  handled  by  temporarily  conditioning  every- 
thing on  the  o-algebra  M generated  by  p over  [0,T]  and  subsequently  finding 
that  the  stochastic  differential  equation  for  the  conditional  density  of  x 
given  8^  and  M turns  out  to  be  independent  of  M. 

^ Letting  (j)  = exp[jy'x  ],  where  yER^  is  nonrandom,  we  find  using  the 

Ito  rule  that 


d((i^  = (t.j.i};^dt  + (})jy'V(t)dv^ 


where 


= jy'[F(t)Xj.  + G(t)Uj.]  -%y'V(t)V'(t)y 


Letting  and  defining  for  the  moment  x^  = E[x  |F^1  and  X^(r)  = 

F[A  (r,x  ,p  )|F  ],  it  follows  from  our  standing  assumptions  that,  for  any 
Borel  set  dz^  - C(t)x^dt  and  N(dt  x B)  -^A^(r)dr  dt  are  independent, 

independent-increment  processes  relative  to  F . We  then  have,  analogously 
to  [1,  Eq.  9],  that  the  conditional  characteristic  function  M (jy)  = 
E[<j)tj8tVM]  of  x^  given  B^VM  satisfies 

dM^(jy)  = lFj.)dt  + E{(J)^(x^-Xj.)'  1F^}C' (t)  [dZj.-C(t)Xj.dt] 

R 

X A^^(r)[N(dt  X dr)  - A^(r)dr  dt] 

Taking  inverse  Fourier  transforms  and  simplifying  then  yields  the  following 
stochastic  differential  equation  for  the  conditional  density  of  x given 
F^  (c.f.  [1,  Eq.  5]) 

dpJxlB^VM)  = l-[p^(xlF^))dt  + Pj.(xlFJ[X  - Xj.]'C*(t)[d2j.  - C(t)Xj.dt] 

+ Pt(xlFt) - Aj.(r)]A"^(r)N(dt  x dr)  (10) 
R 


where 


f-[ql  = - E 8[(F(t)X  + C(t)u  )q]  ./3X. 

i=l  c 1 1 

n n - 

+ ^E  E 3 [V(t)V(t)q],,/3X,3X, 
i=l  j=l  iJ  1 J 

Recalling  from  (2)  that  A^(r,x^,p^)  = , we  see  that  the  integrand 

of  the  last  term  in  (10)  can  be  rewritten  as  iv^Cr.X)  - Yj.(r)]Y^^(r),  where 
Y^(r)  = E[Yj. (r,x^)  I F^] . Noting  that  Y(.(r)  and  x^.  can  be  written  as  inte- 
grals involving  p^(XlF^),  the  evolution  in  time  of  (10)  does  not  depend  in 
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in  any  way  on 


Thus  p^(x|B^VAi)  = Pj.(x|8^),  and  (10)  can  be  rewritten  as 
dpJxiBj.)  = L[p^(x|Bj.)]dt  + P^(X|B^)[X  - xJC'(t)[dZj.  - C(t)Xj,dt] 


■tOtls,)  / 

K 


-1 


„[Y,(r.X)  - Yjr)]Y^ 


(r)N(dt  X dr) 


(11) 


with  the  Gaussian  distribution  N(Xq,Eq)  as  initial  condition.  Of  course, 
E[Xj.iB^VM]  = E[x  |B^],  so  the  definition  of  x given  in  the  statement  of 
Theorem  1 coincides  with  the  temporary  definition  introduced  in  the  proof; 
similar  remarks  apply  to  y,.* 

t A 

The  proof  that  p^(x|B^)  is  Gaussian  with  mean  x^  given  by  (6)  and 
covariance  given  by  (7)  or  (9)  then  follows  from  a straightforward 
inductive  proof  similar  to  that  of  Proposition  1 in  [1);  in  the  intervals 
between  point  occurrences  of  the  space-time  point  process,  p (xjB^)  evolves 
according  to  the  first  two  terms  on  the  right  side  of  (11);  this  is  simply 
Kushner’s  equation  for  linear  system  (la)  with  linear  observations  (lb), 
and  is  known  to  yield  a conditional  density  p (xjB  ) that  is  Gaussian  with 
mean  x and  covariance  E^  satisfying  (6)  and  (7)  or  (9),  respectively, 
with  the  last  term  on  the  right  side  of  each  deleted.  At  those  instants 
when  a point  occurs  In  the  space-time  point  process,  a jump  occurs  in 
Pt(x|8t)  because  of  the  last  term  on  the  right  side  of  (11).  However,  it 
turns  out  that  p (x|B^)  remains  Gaussian  after  this  jump  because  it  was 
Gaussian  before  the  jump  and  because  the  spatial  intensity  Yj.(r,X)  is 
Gaussian.  As  in  [1],  calculation  of  the  last  term  on  the  right  side  of  (11) 
shows  that  the  jump  in  the  conditional  mean  is  given  by  the  last  term  on 
the  right  side  of  (6),  the  jump  in  conditional  covariance  by  the  last  term 
on  the  right  side  of  (7),  and  the  jump  in  the  inverse  of  the  conditional 
covariance  by  the  last  term  on  the  right  side  of  (9) • 

is 

Finally,  to  prove  that  f rp|B  ] = f[y|T  ],  let  x satisfy  dx  = ^ 

F(t)x*dt  + G(t)u^dt;  x*  = E[Xq].  Then  x^  4 k ~ C(t)x^ 


satis ty 
dx 


^ = F(t)x^dt 


+ V(t)dv 


t’ 


dz^  = C(t)x^dt  + dw^ 

- * 

r 4 r~Hx  has  spatial  intensity 


(12a) 


(12b) 


with  Xq  - N(0,Eq)  and  Zq=0,  while 

Yt(^>^t.)  ' N(H(t)Xj.,R(t)), 

Let  be  the  o-algebra  generated  by  z over  [0,t)  and  let  W be  that 
generated  by  the  space-time  process  that  is  obtained  from  tfie  original  one 
by  leaving  N.  unchanged  and_replacing  r by  r-Hx*.  Then,  under  the 
assumption  that  is  also  1 ^-measurable , an  argument  that  parallels 
the  proof  of  Lemma  1 in  [7]  shows  that  Z VW  This  assumption  is 

discussed  shortly.  Thus,  it  is  equivalent  ^o  prove  that  f ^ [y  | Zj.VWj.]  = f^ [y  | • 
Now,  because  p and  N are  independent  of  Xq,  v and  w,  so  also  are  they 
independent  of  x and  z;  thus,  the  joint  density  of  y and  the  event  that 
ti, t2, . . • , tj^  are  the  occurrence  times  of  N over  [0,t)  satisfies 
f [y.tj, . . . ,tj^|Z|.]  = f [y.t]^, . . ,tj^] . Equivalently, 

■ l[y|T^VZJf[t^,..,tj^|ZJ  = f[ylTj.]f[t^,..,tj^] 

Because  N and  z are  independent,  f [t]L»  • • = f[tj^,..tj_]  and  therefore 

ffylT^VZ^]  = f[y|Tj.]  (12c) 


Also  because  the  spatial  components  rj^,r2,. 
and  T^,  we  have 


are  Independent  of  y given 


f[y,r^,...,rj,  \T^\lK^]  = f [y  |T^VX  J f [r^, . . . ,rjj 


e 


where  Is  the  0-algebra  generated  by  x over  (0»t).  Replacing  by  X 
in  the  argument  leading  to  (12c),  we  have  f[p|T^VX^]  = f[p|T  ] for  the  first 
t£..n  on  the  right  side,  whiie  the  left  side  can  be  written  f [ p | r^^ , . . . , r^^  , 

T VX  ]f (r  , . . . ,r  iT  VX  ].  Cancelling  the  second  term  on  each  side,  we‘ t 
are  left  with  t 

(f[p|Wj,VX^l  = )f(n|r^,...,r^  , T^VXJ  = f[p|Tj.], 

which  in  combination  with  (12c)  gives  the  required  result. 

Remark  1.  The  technical  assumption  that  u is  W VZ^-measurable  which  is 
required  for  the  proof  that  f^[p|B  ] = f [p|T  ] also  arises  as  a sufficient 
condition  in  [7].  A generalization  of  [5,  Theorem  3]  shows  that  this  will 
be  the  case  if  u is  generated  from  the  past  of  z,  1.  and  r or  z,  N and  f 
using  a suitably  smooth  control  law.  Specifically,  it  v^ill  be  so  if  p is 
generated  as  a Lipschitz  function  of  the  state  of  a suitably  smooth  finite- 
dimensional  system;  included  here,  in  particular,  is  a control  u so 
generated  from  of  (6) , which  is  of  interest  because  this  is  tne  case 
for  the  optimum  control  found  later. 

K<-mark  2.  The  stochastic  differential  equations  for  x and  Z given  in 
Theorem  1 admit  an  intuitively  simple  interpretation.  "^In  the^'intervals 
between  point  occurrences  of  the  space-time  process,  the  problem  reduces 
to  one  of  estimating  x^.  from  the  observations  z;  the  non-occurrence  of 
further  points  in  these  Intervals  provides  information  about  but  none 
about  x^  because  of  the  separability  (2)  of  X and  our  standing  assumptions 
concerning  independence.  Thus,  during  these  intervals  we  are  left  with  the 
standard  Kalman  filtering  problem  of  estimating  the  state  of  the  linear 
system  (la)  from  the  observations  (lb) ; if  x^  is  conditionally  Gaussian  at 
the  beginning  of  each  such  interval,  it  remains  so  throughout  with  mean 
and  covariance  which  evolve  according  to  (6)  and  (8)  or  (9)  with  the  last 
term  deleted  in  each.  (This,  or  course,  is  also  reflected  in  the  equation 
(11)  for  the  conditional  density  reducing  to  Kushner's  equation  during 
these  Intervals.)  We  now  observe  that  x^  is,  in  fact,  conditionally 
Gaussian  at  the  beginning  of  each  such  interval  because  it  is  at  t=0  and 
because  it  remains  Gaussian  after  each  point  occurrence:  indeed,  at  each 
occurrence  (t,r)  of  the  space-time  point  process  the  spatial  observation  r 
is  an  Independent  observation  on  a Gaussian  random  variable  with  mean  Hx^ 
and  covariance  R;  this  is  equivalent  to  a discrete  observation  of  the  form 

r = H(t)Xj.  + C 

where  ^ ~ N(0,R)  is  independent  of  x and  z.  Thus,  from  standard  estima- 
tion theory  for  Gaussian  random  variables  [e.g.  10],  the  conditional  density 
remains  Gaussian  and  the  change  in  conditional  mean  and  covariance  of  x^ 
after  accounting  for  this  new  observation  are 

“ ^t+  “ ^t  “ + R]~^(r  - Hx^)  = M^(r  - Hx^.) 

A Z^^  - Z'  = -Z  H'[nzir  + R]“^HZ  = -M  HZ^ 
t t“r  t t t t.  t 

dzT^  A = H’r“^H 

t “ t+  t 

of  course,  this  term  is  to  be  included  only  when  an  occurrence  takes  place 
at  (t,r);  multiplying  each  of  these  expressions  by  N(dt  x dr)  and  inte- 
grating over  r'"  takes  care  of  this,  and  constitues  the  last  term  in  (6),  (7) 
and  (9),  respectively. 
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Remark  3.,  It  was  observed  In  the  proof  that  p (x|B^VM)  does  not  depend  on 
M,  and  thus  and  p are  conditionally  independent  given  in  particular, 
x^  and  are  conditionally  independent  given  B , and  thus  the  joint 
problem  of  estimating  x^  and  p^  given  B^  separates  into  two  separate 
problems  of  estimating  x^.  and  estimating  p^: 

f(Xj.,Pj.|8^)  = f(Xj.|Bj.)f(Pj.  |Bj.) 

Furthermore,  the  final  part  of  Theorem  1 establishes  that  f (p^ | B^)=f (p^ |T^) 
depends  only  on  the  Poisson  counting  process  N and  does  not  depend  on 
z or  the  spatial  locations  of  the  points.  This  latter  estimation  problem 
with  various  models  for  p is  examined,  for  example,  in  (6]. 

Theorem  2.  The  unique  admissible  control  u®  that  minimizes  the  cost  (5) 

Is  given  by 

u®  = -P“^(t)G'(t)K(t)Xj.  A -L(t)Xj.  (13) 

where  x^.  = Elx^jB  ) satisfies  the  finite-dimensional  nonlinear  stochastic 
differential  equation  (6)  with  Z given  by  (7),  and  the  n x n symmetric 
non-negative  definite  matrix  K(tJ  satisfies  the  Riccatl  equation 

K(t)  = -K(t)F(t)  - F'(t)K(t)  + K(t)G(t)P“^(t)G' (t)K(t)  - Q(t);  K(T)=S 

(lA) 

The  corresponding  minimum  value  of  J is 

J[u®]  = E{x^K(0)Xq}  + J^tr[KGP“^G’K  E{Z^}  + KW'ldt,  (15) 

where  tr  denotes  trace. 

p 

Proof.  According  to  Astrom  [8],  J[u]  can  be  rewritten  as 

"J[u]  = E{||uj.  + L(t)x^|  }dt  + [Right  side  of  (15)] 

where  x^  = E[x^lB^]  and  1 |y  | ( = y'Py.  The  first  term  on  the  right  side  is 

non-negative,  and  zero  if  and  8nly  if  u^  = -L(t)x  . Thus  (13)  gives  the 
unique  optimum  control  provided  the  right  side  or  (15)  is  invariant  under 
changes  in  u.  The  only  way  for  (15)  to  be  u-dependent  is  through 
and  (7)  shows  that  the  only  possibility  for  Z to  vary  with  u is  via  N^.. 

But  is  a Poisson  counting  process  with  rate  p^,  and  both  N and  p^  are 
specified  at  the  outset  as  mappings  on  (fi,F,P)  without  any  rererence'^to  u. 
Hence  Z and,  therefore,  E{Z^}  and  the  right  side  of  [15]  are  invariant 
under  changes  in  u,  and  the  proof  is  complete. 

Remark  4.  Theorem  2 shows  that  the  solution  to  this  stochastic  control 
problem  can  be  realized  with  a spearated  estimator-controller  in  which  the 
estimator  is  nonlinear,  mean-square  optimal,  and  finite-dimensional  and 
the  controller  is  the  certainty-equivalent  linear  control  law  (l.e.  the 
optimum  linear  control  law  for  the  deterministic  problem  in  which  Xq=E[Xq], 
v^  = E[v^]hO,  and  x^  is  known  exactly).  This  result,  therefore,  Includes 
as  special  eases  the  familiar  llnear-quadratlc-Gaussian  "Separation  Theorem" 
(where  the  space-time  point  process  observations  are  absent)  [e*g-  8]  and 
the  similar  results  in  [1]  - [5]  for  restricted  versions  of  the  space-time 
point  process  observations. 

We  observe  that  Z , the  conditional  covariance  of  x^  given  B^,  is  not 
precomputable  because  the  last  term  on  the  right  side  of *^(7)  depends  on  the 
particular  r£alization  of  the  counting  process  N^.  One  is,  therefore,  led 
to  consider  Z(t)  A both  as  a natural  measure  of  estimation  perfor- 

mance in  its  own  rlght'^and  as  the  particular  measure  of  estimation  perfor- 
mance that  determines  the  optimum  control  cost  (15).  However,  while  Z(t) 
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Is  deterministic  and  In  principle  can  be  precalculated,  this  calculation  is 
Inflnlte-dimen^slonal.  One  way  of  seeing  this  is  to  observe  that  an  attempt 
to  calculate  E(t)  by  taking  expectations  of  both  sides  of  (7)  is  compli- 
cated by  the  last  term  on  the  right  side,  which  requires  the  calculation  of 
E{5IH' [HEH'  + R]”  HE}.  While  a differential  equation  for  this  can  be  written 
down,  it,  in  turn,  requires  expectations  of  additional  nonlinear  functions 
of  Ej.,  and  so  on  ad  infinitwn  in  a mushrooming  requirement  for  additional 
terms  that  is  familiar  from  other  nonlinear  filtering  situations.  Accord- 
ingly, we  _turn  our  attention  to  deriving  easily-computed  upper  and  lower 
bounds  on  E(t).  These  estimation  bounds  then  directly  imply  upper  and 
lower  bounds  on  the  optimum  control  performance  (15). 


IV.  SUBOPTIMUM  ESTIMATORS  AND  UPPER  BOUNDS 

Our  approach  to  finding  easily-computed  upper  bounds  on  E[E^]  is  to 
examine  a parametrized  family  of  suboptimum  estimators  whose  mean-square 
performance  can  easily  be  calculated  exactly.  For  each  suboptimum  estimator 
the  corresponding  mean  square-error  is  then  trivially  a matrix-ordering 
upper  bound  on  E[E  ].  Furthermore,  we  show  that  there  exists  a suboptimum 
estimator  in  this  family  whose  mean-square  performance  is  at  all  times 
smaller  than  that  of  any  other,  thus  providing  a minimal  upper  bound  within 
this  class. 


Motivated  by  the  form  of  the  optimum  estimator  (6),  we  consider  the 
family  of  suboptimum  estimators 

dx^  = F(t)x^dt  + G(t)Uj.dt  + N(t)[dz^  - C(t)x^dtl 


+/^M(t)[r  - H(t)x[V(dt  X dr)  (16) 

'R 

parametrized  by  the  deterministic  uniformly-bounded  n x m-  and  n x q-matrix 
valued  time  functions  N(')  and  M(*)*  This  family  does  not  include  the 
optimum  estimator  (6)  in  which  M is  a random  matrix  which  depends  on  M 
through  E.  Apart  from  the  requirement  that  M(*)>  N(*)  be  deterministic,  the 
sub-optimum  estimator  (16)  and  the  optimum  estimator  (6)  share  the  same 
structure.  The  nonrandomness  of  M enables  us  to  write  dovm  an  ordinary 
n X m-matrlx  differential  equation  for  the  mean  square  error  of  the  sub- 
optimum  estimator  (16).  Indeed,  subtracting  (16)  from  (la),  it  follows 
directly  by  straightforward  calculation  that 

S(t)  A E[(Xj.  - x^)(x^  - xj)’]  (17) 

satisfies  the  linear  matrix  differential  equation 

S = (F  - NC)S  + S(F  - NC)'  + VV*  + NN' 

+ ^{MrHSH’  + R]M  - MHS  - SH'M'};  S(0)  = cov[Xq]  (18) 

where  we  have  suppressed  the  common  argument  t of  all  entries,  and  where 
v(t)  = eIpj.].  Because  all  coefficients  in  (18)  are  uniformly  bounded,  a 
unique  solution  to  (18)  exists  for  all  t€^rO,“).  We  thus  have  proved: 
Theorem  3.  For  any  uniformly  bounded  M(*)  and  N(*),  the  mean-square 
performance  (17)  of  the  estimator  (16)  satisfies  the  linear  matrix  differ- 
ential equation  (18),  and  this  is  a matrix  ordering  upper  bound  on  E[E  ], 
i.e.  for  all  tG[0,<»), 

E(t)  = E[Ej.]  < S(t),  (19) 

in  the  sense  that  S(t)  - E(t)  is  non-negative  definite. 
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We  now  show  that  there  exists  a choice  of  M(*)  and  N(*)  In  (16)  for 
which  the  corresponding  mean-square  performance  given  by  (18)  lies  at  all 
times  below  that  for  any  other  choice. 

Theorem  4.  Let  S*(*)>  M*(*)  and  N*(*)  satisfy 

S*  = FS*  + S*F'  + W - S*C'CS*  - 175*8' [HS*H'  + R]“^HS*;  S*(0)=cov(Xq] 


M*  = S*ir  [HS*H'  + Rl"  , N*  = S*C'  (21] 

where  the  common  argument  t of  all  entries  In  (20)  and  (21)  is  suppressed. 
Let  S(*)  be  the  solution  to  (18)  for  some  arbitrary  M(‘)  and  N(*).  Then, 
for  all  tG  [0,“>), 

E[Ej.)  £ S*(t)  £ S(t)  (22; 

and  S*(t)  = E{(x  - x*) (x^  - x*) ' } is  the  mean-square  performance  of  the 
bound-minimal  estimator 

dx*  = Fx*dt  + GUj.dt  + S*C’ [dz^.  - Cx*dt] 

+ S*H'[HS*H'  + Rf^f  fr  - Hx*]N(dt  x dr)  (23! 

vm  t 


Proof.  Completing  the  square  on  the  right  side  of  (18)  yields 

S = FS  + SF'  + W - SC'CS  - irSH'[HSH'  + R]“^HS  + (N  - SC')(N  - SC’)' 

+ 17[M  - SH'(HSH'  + R)"^](HSH'  + R)  [M  - SH’ (HSH'  + R)"^]', 

S(0)  = cov[Xq]  (24) 

For  given  S(t),  the  right  side  of  (23)  is  clearly  minimized  by  making  the 
last  two  non-negative  definite  terms  0,  in  which  case  the  right  side  of  (23) 
reduces  to  that  of  (20)  while  the  minimizing  choices  of  M(t)  and  N(t)  are 
given  by  (21).  It  remains  to  show  that  this  instantaneous  ordering  on  the 
time  derivative  produces  a permanent  ordering  of  the  solutions  over  [0,T], 
l.e.  that  the  solution  to  (20)  lies  at  all  times  below  that  of  (23)  in  the 
matrix  ordering.  This  is  readily  accomplished  by  using  Lemma  1 in  (9) , 
after  appropriate  modifications  to  reflect  that  initial  conditions,  rather 
than  final  conditions,  are  of  interest  here.  This  means  that^ln  [9,  Lemma  1] 
the  left  sides  of  (*)  and  (**)  should  be  replaced  by  +X  and  +Y,  respectively, 
and  all  time  orderings  Ol<t_<s^T  replaced  by  0[£Sj«t^l.  Then,  letting  (24)  play 
the  role  of  (*)  and  (20)  the  role  of  (**)  and  checking  conditions  1)  to  4) 
of  [9,  Lemma  1]  we  have:  1)  and  2)  are  trivial  under  our  standing  assump- 
tions; 3)  holds  because,  by  a subsidiary  application  of  (9,  Lemma],  the 
solution  to  (20)  lies  at  all  times  above  that  of  the  Riccati  equation 

f = Fr  + FF'  + W - r[C'C  + 77h’r"^H]  ; r(0)  = cov[Xq]  (25) 

for  which  3)  is  known  to  hold;  finally,  4)  holds  for  (18)  and  therefore 
(24)  because  if  (t)  and  S„(t)  are  the  respective  solutions  to  (18)  with 
initial  conditions  Sj^(O)  and  82(0),  Sj^(O)  82(0),  then  Sj^(t)  ^ S2(t)  for 
all  t,  since 

Si  - §2  = (F  - NC  - 7mH) (Si  - S2)  + (Si  - $2) (F  - NC  - pMH) ' 

+ 77mH(Si  - S2)H'M;  (Si  - S2)  (0)  = 0 

and  the  solution  to  this  lies  at  all  times  above  the  (identically  zero) 
solution  to 
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Y •=  (F  - NC  - y^^^)Y  + Y(F  - NC  - pMH) ' 
by  a further  subsidiary  application  of  [9,  Lemma  1], 

Remark  5.  The  evaluation  of  the  performance  of  the  suboptimum  estimator  (16) 
is  a second-order  analysis  that  uses  only  the  means  and  covariances  of  the 
various  random  variables  and  processes  and  makes  no  use  of  the  Gausslan-ness 
of  Xq,  V,  w and  r.  Thus  the  results  of  this  section  remain  valid  if  v and 
w are  replaced  by  normalized  uncorrelated- increment  processes  that  are 
uncorrelated  with_each  other  and  wijth  and  r,  with  Xq  having  any  distri- 
bution with  mean  x^-and  covariance  Iq  and  the  spatial  intensity  Y^(r,x^) 
being  any  distribution  with  mean  Hx  and  covariance  R such  that  r is 
uncorrelated  with  x^.  The  bound-minimal  estimator  (23)  can  then  be  viewed 
as  the  best  estimator  in  the  family  (16).  These  estimators  are  bilinear 
because  of  the  product  r.N(dt  x dr)  in  the  last  term,  though  they  might 
also  be  considered  to  be  in  a sense  linear,  to  the  extent  that  N(dt  x dr) 
merely  signals  the  arrival  of  a spatial  observation  r which,  as  with  z, 
is  utilized  linearly  in  the  production  of  x. 

V.  ESTIMATION  LOWER  BOUNDS  AND  CONTROL  BOUNDS 

Theorem  5.  Let  S^  be  the  solution  to  (25).  Then,  for  all  t^O,  S^(t)  is  a 
matrix-ordering  lower  bound  on  E[E^],  l.e. 

S*(t)  < EEEj.]  (26) 

The  corresponding  lower  bound  on  the  optimum  control  cost  is 

J[u“]  E{x^K(0)Xq}  + y’^tr[KGP"^G'KS*  + KW']dt  ^2?) 

-1 

Proof,  We  have  from  (9)  than  Z ^ E[Z  ] satisfies 

F - F'  - E[E~^W'Z"^]  + pH’R'^H;  z“^(0)  = (cov[Xq] )"^ 


= - Z“^  F - F’  Z“^  - Z~^  W’Z~^  + pH'r“^  - A 


where 


A = E[Z“^W’Z”^]  - Z“^  W Z”^ 


cov[Z  ^V]  > 0 


It  then  follows  from  [9,  Lemma  1]  that  Z lies  at  all  times  below  the 
solution  to 


- HF  - F'E  - EW 


E + pH'R“^; 


E(0)  = (cov[Xq1) 


E(t)  > Z“^(t)  A E[Z^^]  > (EZj.)"^,  (31) 

the  last  Inequality  being  a matrix  version  of  Jensen's  inequality  proved 
in  the  Appendix.  Taking  inverses  of  (31)  and  noting  that  if  S^  is  the 
solution  to  (25)  then  S“^  is  the  solution  to  (30),  we  have  the  desired 
result  (26).  The  control  bound  (27)  then  follows  by  combining  (15)  and  (26). 


We  remark  in  passing  that  (25)  gives  the  covariance  of  the  optimum 
estimator  when  the  space  time  point  process  observations  are  replaced  by 
continuous  observations  of  the  form 

dy^  = Hx^dt  + (p  ^ R)^dn^ 

where  n^  Is  a Wiener  process  independent  of  Xq,  v and  w. 
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Remark  6.  Comparing  (20)  for  the  minimal  upper  bound  with  (25)  for  the 
lower  bound,  we  see  that  these  two  bounds  will  be  close  to  each  other  and 
thus  to  E[Z^)  if  HS*H'  is  small  compared  with  R (or,  equivalently.  If  H'R  H 
is  small  compared  with  S*) . Both  bounds  will  also  be_close  to  each  other 
and  to  the  optimum  performance  if  the  mean  intensity  y is  small.  These 
are  discussed  later  in  terras  of  our  motivating  example. 

Once  we  have  deduced  upper  and  lower  bounds  on  the  estimation  perfor- 
mance E[E^],  corresponding  bounds  on  the  optimum  control  performance  follow 
directly  by  substitution  of  these  bounds  for  E(Z^]  in  (15): 

Theorem  6.  Upper  and  lower  bounds  on  the  optimum  control  performance  J[u‘*] 
of  (15)  are 

E{xjjK(0)xQ}  + J'^tr[KGP"^G'PS*  + KW’]dt 

_<  J[u*]  < E{x^K(0)Xq}  + J’^tr[KGP~^G'KS*  + KW' ]dt 

where  is  the  solution  to  (25)  and  S*  is  the  solution  to  (20). 

VI.  DISCUSSION 

The  above  estimator-controller  solution  extends  results  in  [2]  and  [3] 
to  Include  a more  general  form  of  observation.  Just  as  with  the  observation 
model  in  [2]  and  [3],  this  more  general  observation  is  motivated  by  communi- 
cation systems  that  employ  a narrow  beam  of  light  as  a carrier,  by  star 
tracking  systems,  and  by  infra-red  tracking  systems,  all  of  which  have  a 
requirement  for  position  sensing  and  active  tracking  to  maintain  optical 
alignment  in  the  presence  of  a variety  of  disturbances.  We  shall  indicate 
how  the  models  of  [2,  Sec.  4]  and  [3,  Sec.  4]  are  usefully  extended  by  this 
more  general  observation.  The  estimator-controller  solution  of  Theorem  2 
provides  a possible  tool  for  the  design  of  an  optical  tracking  system  under 
the  conditions  indicated  below,  and  the  performance  bounds  of  Sections  IV 
and  V provide  the  means  for  predicting  the  performance  of  such  designs. 

^ Let  I(t,r)  denote  the  light  intensity  at  time  tG[0,“)  and  position 
rER  of  an  optical  field  incident  on  the  photoemisslve  surface  of  a two- 
dimensional  photodetector  on  boreslght  and  without  any  motions.  Here,  R is 
a subregion  of  R^  corresponding  to  the  photoemisslve  surface.  We  assume  a 
Gaussian  intensity-profile 

I(t,r)  = lQ(c)exp{-%r'R  ^(t)r}. 

Vibration,  beam  steering  due  to  propagation  of  the  light  beam  through  atmos- 
pheric turbulence,  and  other  effects  cause  the  spot  of  light  on  the  photo- 
emisslve surface  to  move  about  in  a random  fashion  and  to  fluctuate  randomly 
in  optical  intensity.  In  this  case,  the  intensity  profile  becomes 

I(t,r,y  (t))  = In(t)exp{-^Ir-y  (t)]'R"^(t)[r-y  (t)]}, 
m u m m 

where  y (t)  models  the  random  motions,  and  lQ(t)  is  a random  process  (e.g., 

a lognormal  process)  that  models  random  intensity  fluctuations.  We  assume 

that  {yju(t);t^0}  is  derived  from  a Gaussian  diffusion  satisfying 

dx  (t)  = F (t)x  (t)dt  + V (t)dv  (t),  y_(t)  = H (t)x  (t), 

m mm  mm  m mm 

where  {v  (t);t>^0}  is  a standard  Wiener  process.  The  fading  process  {lQ(t); 
t^O)  is  assumed  to  be  independent  of  motion  processes  but  is  otherwise 
arbitrary.  The  purpose  of  the  tracking  controller  is  to  compensate  for  these 
random  motions  and  random  fading  in  order  to  maintain  optical  alignment. 

Thus,  in  the  presence  of  a controller  to  position  telescopes,  mirrors,  or 
other  pointing  devices,  the  Intensity  becomes 
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I(t,r,y^(t),yp(t))  . I 

» lQ(t)exp{-^[r-y^(t)  + yp(t)]'R  ^(t)Ir-y^(t)  + yp(t)}} 

where  y (t)  - y (t)  Is  the  tracking  error.  Ideally,  this  error  should  be 
zero,  b6t  this  cannot  be  accomplished  for  two  reasons:  the  position  error 
y^(t)  is  unknown  and  must  be  estimated  from  data  available  at  the  photo- 
detector output,  and  the  tracking  devices  will  have  some  inertia  so  that 
y^(t)  cannot  be  tracked  instantaneously  even  if  it  were  known.  We  model  the 
tracking  devices  by  a linear  stochastic  plant 

dXp(t)  = Fp(t)Xp(t)dt  + Gp(t)u(t)dt  + Vp(t)dVp(t) 

YpCt)  = Hp(t)Xp(t), 

where  u(t)  is  the  input  to  the  tracking  devices  from  the  tracking  controller, 
and  {v  (t);t^O}  is  a standard  Wiener  process  modeling  local  disturbances 
such  ai  those  due  to  vibration. 

Photoelectron  conversions  take  place  in  the  photoemisslve  surface  at 
a rate  proportional  to  the  incident  light  Intenstiy  [3].  Thus,  the  photo- 
electron conversion  rate  has  the  form  of  X^(r,x  ,p^)  for  (t,r)G[0,>»)  x R 
with  an  appropriately  scaled  version  of^l_(t5,  and  x is  the  vector 
obtained  by  adjoining  x and  x , and  H is  obtained  from  H and  H in  an 
obvious  way.  m p m p 

The  problem  of  optical  tracking  is  to  follow  the  position  of  maximum 
light  Intensity  at  time  t in  terms  of  both  photoelectron  conversions  observed 
on  [0,t)  X R and  observations  of  the  plant  state  x obtained  with  sensors 
located  at  the  tracking  devices.  These  latter  obsirvations  are  modeled 
according  to  (lb)  so  as  to  account  for  sensor  noise.  Except  for  the  finite- 
ness of  R,  this  problem  is  identical  to  control  problem  studied  above  when 
photoelectron  conversions  are  identified  as  space-time  points.  An  approxi- 
mation that  appears  reasonable  when  the  beam  is  small  and  the  tracking 
errors  are  small  (i.e.  fine  tracking  mode  rather  than  an  acquisition  mode) 
compared  to  the  size  of  the  photoemisslve  surface  is  to  replace  R by  R^. 

With  this  approximation,  the  optical  tracking  problem  is  solved  by  the 
result  in  Theorem  2. 

It  is  important  to  note  that  according  to  Remark  3 and  Theorem  2,  the 
design  of  the  tracking  controller  does  not  depend  in  any  way  upon  the  source 
or  nature  of  randomness  in  I (t).  Thus,  for  example,  the  same  design  is 
obtained  if  lQ(t)  is  random  due  to  atmospheric  turbulence  or  modulation  by 
an  information-bearing  signal  or  a combination  of  these. 

The  upper  and  lower  bounds  of  Sections  IV  and  V provide  a measure  of 
the  performance  for  the  optical  position-sensing  and  tracking  system  derived 
from  Theorem  2.  From  Remark  6,  the  upper  and  lower  bounds  merge  when  HS*H' 

Is  small  compared  to  the  beam  spread  as  measured  by  R.  It  is  evident  that 
the  estimation  and  control  lower  bounds  derived  as  above  for  observations 
of  each  photoelectron  conversion  are  also  lower  bounds  for  both  optimal  and 
suboptlmal  trackers  that  employ  observations  obtained  by  temporal  or  spatial 
averaging  as  would  be  obtained  using  photon  counting  and  a quadrant 
photomultiplier. 

We  mention  also  that  A.  Segall  in  [11]  has  applied  the  models  of  [1] 
and  [3]  to  study  computer  communication  networks.  The  upper  and  lower 
bounds  on  performance  that  we  have  derived  can  be  applied  in  this  context 
as  well. 
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APPENDIX 

Lemma.  Let  W^^  and  W2  be  positive  definite  matrices,  and  letyGlOjll*  Then 
[yWj^  + (l-Y)W2l  1 + (1-y)W2^  (A1) 

l.e.  W ^ is  convex  in  a matrix  sense.  Furthermore,  we  have 
E[W~^]  > (E[W])"^ 

(c.f.  Jensen’s  inequality) 

Proof.  See  [12]. 
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Abstract — Estimatloa  and  controi  problems  are  exunlned  for  a class  of 
models  Involving  a linear  system,  a quadratic  cost,  and  observations  that 
include  a space-time  point  process  as  well  as  the  faidiliar  “signal  in 
additive  Wiener  process”  measurements.  Motivation  for  this  class  of 
models  is  given  in  terms  of  position  sensing  and  tracking  for  quantum- 
limited  optical  communkatioo  problems.  These  models  include  as  special 
cases  several  simpler  ones  considered  previously.  As  in  tbe  simpler  cases, 
the  optimum  estimator  is  finite-dimensional  and  nonlinear,  and  the  opti- 
mum controller  separates  Into  the  optimum  estimator  followed  by  the 
certainty-equivalent  control  law. 

Although  the  optimum  estimator  and  the  optimum  controller  are  finite- 
dimensional, the  corresponding  expected  error  covariance  and  optimum 
cost  require  infinite-dimensional  calculations.  This  motivates  the  deriva- 
tion of  easiiy-computed  upper  and  lower  hounds  on  estimator  and  con- 
troller performance.  The  upper  bounds  are  derived  by  evaluating  exactly 
the  performance  of  a parametrized  family  of  suboptimum  designs;  one  of 
these  b identified  as  having  smaller  performance  than  any  other,  thus 
providing  a minimal  upper  bound  within  thb  family.  The  lower  bounds  are 
obtained  directly  by  calculations  Involving  inequalities. 


I.  Introdittion 

SNYDER  and  Fishman  (1)  have  considered  the  prob- 
lem of  estimating  the  Gaussian  state  of  a linear 
stochastic  system  from  observations  of  a point  process  in 
which  each  point  has  both  a spatial  and  a temporal 
coordinate.  The  state  of  the  system  influences  the  spatial 
component  of  the  intensity  of  the  observed  space-time 
point  process;  at  any  given  time,  the  contours  of  constant 
spatial  intensity  are  ellipsoids  whose  common  centroid 
depends  linearly  on  the  current  system  state.  The  temporal 
K component  of  the  intensity  is  assumed  in  (I)  to  be  de- 

terministic. The  conditional  density  of  the  system  state  at 
any  time  given  the  past  of  the  observation  process  is 
I shown  to  be  Gaussian,  and  the  conditional  mean  and  the 

conditional  covariance  satisfy  finite-dimensional  nonlin- 
ear stochastic  differential  equations  that  art  driven  by  the 
’ observed  space-time  point  process. 

This  model  has  been  generalized  in  [2]  and  [3]  to 
I include  causal  feedback  interactions  between  the  observed 
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point  process  and  the  state  of  the  linear  stochastic  system. 

Although  inclusion  of  a feedback  (control)  term  destroys 
the  Gaussianness  of  the  system  state  process,  it  does  not 
alter  either  the  Gaussian  form  of  the  conditional  density  of 
the  state  given  past  observations  or  the  finite-dimensional- 
ity of  the  stochastic  differential  equations  for  the  condi- 
tional mean  and  the  conditional  covariance.  These  and 
related  properties  underly  the  derivation  of  a separation 
theorem  for  a stochastic  optimal  control  problem  involv- 
ing these  system  and  observation  processes  and  a 
quadratic  cost  functional.  Motivation  for  this  stochastic 
control  problem  is  given  in  [2]  and  (3]  in  terms  of  position 
sensing  and  tracking  for  quantum-limited  optical  com- 
munication problems. 

In  this  paper,  we  first  generalize  the  model  of  [2|  and  [3] 
in  two  ways.  On  the  one  hand,  the  space-time  point-pro- 
cess observations  are  supplemented  by  continuous  ob- 
servations of  a linear  function  of  the  system  state  in  an 
additive  Wiener  process.  The  optimum  estimator  for  a 
restricted  version  of  this  problem  is  included  in  the  dis- 
sertation of  Vaca  (4)  and  a corresponding  separation  theo- 
rem is  to  be  included  in  a forthcoming  paper  [5].  Here  we 
remove  the  requirement  in  [4]  and  [5]  that  the  supplemen- 
tary observations  have  the  same  dimensions  as  the  spatial 
component  of  the  space-time  point  process.  On  the  other 
hand,  we  allow  the  temporal  component  of  the  intensity  of 
the  observed  space-time  point  process  to  be  itself  a ran- 
dom process.  Under  appropriate  independence  assump- 
tions, it  is  shown  that  the  joint  problem  of  estimating  the  j 

r.tate  of  the  system  and  the  temporal  intensity  reduces  to 
two  separate  problems,  one  of  which  is  that  considered  in  ; 

[2]  and  [3]  while  the  other  is  a standard  estimation  prob-  | 

lem  for  point-process  observations  having  no  spatial  com-  j 

ponent.  as  discussed,  e.g.,  in  (6).  All  properties  needed  to 
extend  the  separation  theorem  for  stochastic  control  prob- 
lems are  retained.  These  two  generalizations  are  discussed  j 

later  in  terms  of  the  optical  position-sensing  and  tracking  I 

problem  that  motivated  [2]  and  [3).  T 

Second,  we  examine  estimation  and  control  perfor-  i 

mance  via  upper  and  lower  bounds.  While  in  all  cases  the  j 

optimum  estimator  and  the  corresponding  conditional  | 

error  covariance  satisfy  finite-dimensional  stochastic  dif-  i 

ferential  equations  and  thus  can  be  computed  on-line,  j 

both  depend  on  the  observed  space-time  point  process  I 

and  cannot  be  precomputed.  Insofar  as  the  conditional  j 

covariance  is  concerned,  this  contrasts  with  the  precom- 
putability that  holds  for  the  Kalman  filter.  One  is  there- 
fore led  to  consider  the  expectation  of  the  conditional 
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covariance,  both  as  a natural  measure  of  estimation  per- 
formance in  its  own  right  and  because  it  happens  to  be  the 
particular  measure  of  estimation  performance  that  de- 
termines the  optimum  cost  in  the  stochastic  control  prob- 
lems considered  here  and  in  [2]  and  [3).  However,  while 
the  expectation  of  the  conditional  covariance  is  determin- 
istic and  in  principle  can  be  precalculated,  this  calculation 
turns  out  to  be  inrinite-dimensional.  With  this  in  mind,  we 
derive  in  Sections  IV  and  V easily-precalculable  matrix- 
ordering upper  and  lower  bounds  on  the  expected  condi- 
tional covariance.  The  upper  bounds  are  obtained  by 
determining  the  exact  performance  of  each  estimator  in  a 
parametrized  family  of  suboptimal  estimators  whose 
structure  is  similar  to  that  of  the  optimum  estimator  but 
for  which  the  mean-square  error  is  precomputable.  From 
within  this  class,  we  identify  a particular  suboptimum 
estimator  whose  mean-square  error  lies  at  all  times  below 
that  of  any  other  in  the  matrix  ordering  sense.  The  lower 
bound  is  obtained  directly  using  differential  and  other 
inequalities. 

II.  Formulation  of  the  Estimation  and 
Control  Problems 

Consider  the  stochastic  linear  system 

dx,=  F{t)x,dt+ G{t)u,dt+  V{t)dv,  (la) 

d2,=  C(t)x,dl  + dw,,  Zo  = 0 (lb) 

where  the  state  x,  is  an  n-dimensional  random  vector,  the 
control  u,  is  a /c-dimensional  vector  whose  measurability  is 
defined  later,  v and  w are  independent  (normalized)  /-  and 
^-dimensional  Wiener  processes,  the  random  initial  state 
Xg  of  (la)  is  independent  of^c  and  w and  is  Gaussian  with 
mean  Xq  and  covariance  Sg,  and  the  deterministic  uni- 
formly funded  matrix-valued  time  functions  F(  ),  (/(•). 
F(  ),  and  C(  ) have  the  appropriate  dimensions. 

In  addition  to  observations  of  the  process  z,  there  are 
also  available  observations  of  a space-time  point  process 
defined  on  [0,  oo)X  f?"  as  follows.  Each  point  occurrence 
is  identified  by  a temporal  coordinate  r€[0,  oo)  and  a 
spatial  coordinate  rGR”.  Let  t and  A be  Borel  sets  in 
[0, oo)  and  respectively,  and  denote  by  N(tXA)  the 
number  of  points  occurring  in  tXA.  We  define  IV,* 
/V([0,r)x  f?")  to  be  the  number  of  points  up  to  but  not 
including  time  t regardless  of  their  spatial  location;  N.  is 
taken  to  be  a doubly  stochastic  Poisson  counting  process 
with  intensity  /x,  with  n and  N stochastic  processes  that 
are  independent  of  Xg,  v,  and  w,  and  n,  is  almost-surely 
positive.  Given  that  N has  a jump  at  t (i.e., 
the  spatial  location  r of  the  point  is  taken  to  be  an 
m-dimensional  Gaussian  random  vector  with  mean  H (t)x, 
and  known  positive  definite  covariance  R (/),  where  H(-) 
is  a known  m X /i-matrix  valued  time  function.  Given  N, 
and  Xj  for  5>0,  the  spatial  locations  are  independent 
random  vectors  that  arc  independent  of  all  other  random 
entities.  Thus,  the  space-time  point  process  can  be  thought 
of  as  having  an  intensity 


H,y,{r,x,)  (2) 

that  separates  into  the  product  of  a temporal  component 
H,  that  underlies  the  Poisson  counting  process  N and  a 
spatial  component 

Y,(r,x,)~IV(//(r)x„/?(r))-(2^)-"/^[dct/?(f)]''^^ 

■exp{-^(r-H  (/)x,yR  (Ojc,)}  (3) 

that  gives  the  density  of  the  spatial  location  r of  the  point 
occurring  at  t. 

Let  be  the  underlying  probability  space.  We 

denote  by  2,  the  sub-o-algebra  of  generate  by  the 
process  z over  the  interval  [0,r),  and  by  the  sub-o-alge- 
bra generated  by  the  space-time  point  process  over  [0,i). 
Let  be  the  smallest  o-algebra  containing  2, 

and  01,.  It  is  assumed  throughout  that  u,  is  ^,-measur- 
able  and  such  that  the  solution  to  (i;:)  is  well-defined; 
such  controls  will  henceforth  be  called  admissible. 

The  estimation  problem  to  which  we  address  ourselves 
is  to  find  the  conditional  means 

X,  = £[x,|«,],  /J,  - £[ /i,|«,]  (4) 

and  the  corresponding  conditional  covariances 

2,-cov[x,|<a,],  r,-cov[,i,|®,]. 

The  control  problem  we  examine  is  to  find  the  admissi- 
ble control  (u,  :rE[0, T]}  that  minimizes  the  quadratic 
cost  functional 

J[»]^E\^j^[u',P{t)u,  + x',Q{t)x,]dt  + x'TSxT^  (5) 

where  the  symmetric  uniformly-bounded  matrix-valued 
time  functions  have  the  appropriate  dimensions  with  Q{t) 
and  5 nonnegative  definite  and  P{t)  positive  definite. 

Our  notation  is  generally  as  follows;  lower  case  italic 
letters  denote  vectors,  upper  case  italic  letters  denote 
matrices,  and  script  letters  denote  o-algebras;  v,  denotes  a 
time-indexed  random  vector,  in  contrast  to  v(t)  which 
denotes  a time-indexed  deterministic  vector;  everything 
takes  place  on  the  fixed,  finite  'ime  interval  [0,  TJ; 
N{q,Q)  means  that  y is  Gaussu'n  with  mean  q and 
covariance  Q;  the  inequality  P<.Q  between  symmetric, 
nonnegative  definite  matrices  means  that  Q-  P is  non- 
negative definite. 

III.  Solution  of  the  Optimal  Estimation  and 
Control  Problems 

Theorem  I:  The  conditional  density  of  x,  given  'S,  is 
Gaussian  and  the  conditional  mean  and  the  conditional 
covariance  satisfy  the  finite-dimensional  nonlinear 
stochastic  differential  equations 
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dx,*’  F(i)x,dt+  G {i)u,di  + Z,C'(t)[dz,—  C {t)x,di] 

+ f M,[r-H(Ox,]\(dtXdr),  Xo-£[xo]  (6) 

J urn 

d'Z,=‘  F{t)'^,dt  + 'Z,F\i)dt  V{t)V'{t)dt 

-l,Cy)Cit)I.,dt- M,H(t)^,dN„  2:o-cov[xo] 

(7) 

where 

A/, //'(/)+/?(/)]''•  (8) 

If  cov(xo]  is  positive  definite  then  2,  is  almost-surely 
positive  definite  and  its  inverse  satisfies  the  finite-dimen- 
sional nonlinear  stochastic  differential  equation 

d2; ' » - 2- 'F{t)di  - F'{i)^; ' dt 

-'z;'v(t)V'(t)^;^dt^c\t)C(t)dt 

■^H'(t)R-'{t)H(t)dN„  2o-'  = (cov[xo])-'.  (9) 

The  conditional  density  of  /i,  given  ® coincides  with 
the  conditional  density  of  ja,  given  the  o-algebra  gener- 
ated by  the  past  of  the  temporal  process  N,,  i.e., /(/tl®,) 
=/,(/t|”T,),  assuming  the  control  u,  satisfies  a technical 
property  specified  in  the  proof  and  discussed  immediately 
thereafter. 

Proof:  The  derivation  of  (6)-(9)  we  give  here  parallels 
the  proofs  of  [1,  lemma  I and  proposition  1]  which  estab- 
lish the  corresponding  result  for  the  special  case  where 
M,  = 0,  fi,  = n(t)  is  deterministic,  and  C(/)  = 0 (i.e.,  the 
observations  z are  not  available).  In  outline,  the  modifica- 
tions that  are  made  to  include  each  of  these  generaliza- 
tions are  as  follows:  the  introduction  of  a -measurable 
u,  causes  no  difficulty  since  u,  is  deterministic  in  all 
calculations  which  involve  probability  measures  condi- 
tioned on  : the  presence  of  nonzero  C ( • ) merely  adds 
an  additional  term  that  is  familiar  for  this  “signal  in 
Wiener  process”  observation  model;  the  generalization  to 
random  (x,  is  handled  by  temporarily  conditioning  every- 
thing on  the  (j-algebra  'em  generated  by  n over  (0,  T]  and 
subsequently  finding  that  the  stochastic  differential  equa- 
tion for  the  conditional  density  of  x,  given  and  “DU 
turns  out  to  be  independent  of  0TL . 

Letting  <#>,“exp[y>’'xj  where  _ye/?"  is  nonrandom,  we 
find,  using  the  ltd  rule,  that 

d<l>,  = </>,»/',  dt  + <t>jy’  y(t)dv, 

where 

,p,~Jy'[F(l)x,+  G{l)u,]-^y'y(t)V'it)y. 

Letting  = and  defining  for  the  moment  x,* 

£[x,|'5,l  and  \,(r)- £(\(r,x,,/i,)|'5,l,  it  follows  from  our 
standing  assumptions  that,  for  any  Borel  set  BER 
dz,-  C(t)x,dt  and  N(dix  B)- fgk,(r)drdt  are  indepen- 
dent, independent-increment  processes  relative  to  We 
then  have,  analogously  to  (1,  eq.  (9)],  that  the  conditional 
characteristic  function  M,{Jy)^  E[it>,\%]  of  x,  given  ‘S, 


— • 


satisfies 

dM,{jy)’-E(<t>,tp,\‘»,}di  + E{,i>,(x,-x,y\%} 
■C'{t)[dz,-C(i)x,di] 

{dtxdr)  — \(r)drdty 

Taking  inverse  Fourier  transforms  and  simplifying  then 
yields  the  following  stochastic  differential  equation  for  the 
conditional  density  of  x,  given  5,  (cf.  [1,  eq.  (5)]); 

dp,  {X\%)  - e [ p,(x\%)]dt  +P.(X\<5,)[X  - X,]' 
■C’{t)[dz,-C{t)x,dt] 

+p,(X\%)-  j^j\(r,X,p,)-\,(r)] 

■\,-\r)N{dtXdr)  (10) 

where 

e[9]=-  i ^[{F(t)X+G(t)u,)qy^X, 

/ — I 

+ 3-  i i d^[y(t)y(i)qlj/dx,dXj. 

^ ,-l y-1 

Recalling  from  (2)  that  \(r,x,,p,)  = ii,y,(r,x,),  we  see  that 
the  integrand  of  the  last  term  in  (10)  can  be  rewritten  as 
'^here  Y,(r)  « £Iy,(r,  x,)|5',]. 
Noting  that  y,(r)  and  x,  can  be  written  as  integrals  involv- 
ingp,(A'|^,),  the  evolution  in  time  of  (10)  does  n^  depend 
in  any  way  on  p,.  Furthermore,  Po('^l^o~^(''^0’2o))  is 
independent  of  p by  assumption.  TTius,  p,(Ar|®,  V‘3'il)  = 
p,(X\9>,),  and  (10)  can  be  rewritten  as 

dp.{X\^,)^t[p,(X\<^,)]dt+p,(X\%,)[X-x,]' 

■CV)[dz,-C(t)x,dtyp,{X\%,) 

■[  [y,{r,X)-y,(r)]y,-'(r)N{dlXdr)  (11) 

•'R"' 

with  the  Gaussian  distribution  N(X(^2o)  as  initial  condi- 
tion. Of  course,  £[x,|®,V‘5ll)“  £(x,|‘;ftj,  so  the  defini- 
tion of  X,  given  in  the  statement  of  Theorem  1 coincides 
with  the  temporary  definition  introduced  in  the  proof; 
similar  remarks  apply  to  y,. 

The  proof  thatp,(A'|®,)  is  Gaussian  with  mean  x,  given 
by  (6)  and  covariance  2,  given  by  (7)  or  (9)  then  follows 
from  a straightforward  inductive  proof  similar  to  that  of 
[1,  proposition  I];  in  the  intervals  between  point  oc- 
currences of  the  space-time  point  process,  p,(Xl'S,) 
evolves  according  to  the  first  two  terms  on  the  right  side 
of  (1 1);  this  is  simply  Kushner's  equation  for  linear  system 
(la)  with  linear  observations  (lb),  and  is  known  to  yield  a 
conditional  density  p,{X\%,)  that  is  Gaussian  with  mean 
X,  and  covariance  2,  satisfying  (6)  and  (7)  or  (9),  respec- 
tively, with  the  last  term  on  the  right  side  of  each  deleted. 
At  those  instants  when  a point  occurs  in  the  space-time 
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point  process,  a jump  occurs  in  p,(A'|'dii,)  because  of  the 
last  term  on  the  right  side  of  (11).  However,  it  turns  out 
that  remains  Gaussian  after  this  jump  because  it 

was  Gaussian  before  the  jump  and  because  the  spatial 
intensity  y,(r,A')  is  Gaussian.  As  in  [1],  calculation  of  the 
last  term  on  the  right  side  of  (11)  shows  that  the  jump  in 
the  conditional  mean  is  given  by  the  last  term  on  the  right 
side  of  (6),  the  jump  in  conditional  covariance  by  the  last 
term  on  the  right  side  of  (7),  and  the  jump  in  the  inverse 
of  the  conditional  covariance  by  the  last  term  on  the  right 
side  of  (9). 

Finally,  to  prove  that /,[  nlyJ,]=/[  nlfT,],  let  x*  satisfy 
dx*  = F(t)x* dt+  G {t)u, dt;xQ  = E [jcq].  Then  x,  = x,-  x* 
and  z,  = z,-  C {t)x*  satisfy 

d.ii-- F(i)x,dt  + y (t)dv,,  dz,—  C{t)x,dt  + dWi  (12a) 

with  Xo~A(0,2o)  and  Zq  = 0,  while  r = r-Hx*  has  spa- 
tial intensity 

UFx,)~N(H(t)x,.R(t)).  (12b) 

Let  2,  be  the  o-algebra  generated  by  z over  [0,0  and  let 
be  that  generated  by  the  space-time  process  that  is 
obtained  from  the  original  one  by  leaving  N unchanged 
and  replacing  r by  r-Hx*.  Then,  under  the  assumption 
that  u,  is  also  “2,  \/'^,- measurable,  an  argument  that 
parallels  the  proof  of  Lemma  1 in  [7]  shows  that 
= ?,  V^,-  This  assumption  is  discussed  shortly.  Thus,  it 
is  equivalent  to  prove  that  ii[ja|2,V"^,)=/,[Ml“X]-  Now, 
because  p.  and  N are  independent  of  Xq,  v,  and  w,  so  also 
are  they  independent  of  x and  f ; thus,  the  joint  density  of 
p and  the  event  that  • • ,/*  are  th^e  occurrence  times 
of  N over  [0,/)  satisfies /[/i,/,,-  • • ,r*|%,]=/[fi,/,,-  • ■ ,t*], 
where  and  is  the  a-algebra  generated  by 

X over  (0,/).  Equivalently, 

/[  H'T,  V'5f,]/[t„-  • • ,/*!'¥,]  =/[  p\^]f[i,,-  ■ • ,t*]. 

Because  N,  x and  i are  independent,  /[t|,  ■ • = 

/lO-'  ■ ■ - 3nd  therefore 

■'>,].  (12c) 

Also,  we  have  from  Bayes’  rule  that 

/[  vt,]y[r„-  ■ • ,r;v,l%V‘y,] 

=/[  /1)V%VT,]. 

since  both  equal V?^J-  Now  the  second 

factor  on  each  side  equals 

N, 

11 

i- I 

and  cancelling  these  and  combining  the  result  with  (12c) 
yields 

j[p\‘%i,\/%]-J[p\%]. 


Recalling  that  the  desired  result  then 

follows  immediately. 

Remark  1:  The  technical  assumption  that  u,  is  ^,\J 
S', -measurable  which  is  required  for  the  proof  that 
^1  ja|‘ij5,]  = /,[ also  arises  as  a sufficient  condition  in 
[7].  A generalization  of  [7,  theorem  3]  shows  that  this  will 
be  the  case  if  u is  generated  from  the  past  of  z,  N,  and  r or 
z,  N,  and  f using  a suitably  smooth  control  law.  Specifi- 
cally, it  will  be  so  if  u is  generated  as  a Lipschitz  function 
of  the  state  of  a suitably  smooth  finite-dimensional  sys- 
tem; included  here,  in  particular,  is  a control  u,  so  gener- 
ated from  X,  of  (6),  which  is  of  interest  because  this  is  the 
case  for  the  optimum  control  found  later. 

Remark  2:  The  stochastic  differential  equations  for  x, 
and  2,  given  in  Theorem  1 admit  an  intuitively  simple 
interpretation.  In  the  intervals  between  point  occurrences 
of  the  space-time  process,  the  problem  reduces  to  one  of 
estimating  x,  from  the  observations  z;  the  nonoccurrence 
of  further  points  in  these  intervals  provides  information 
about  p,  but  none  about  x,  because  of  the  separabilitv  (2) 
of  X and  our  standing  assumptions  concerning  indepen- 
dence. Thus,  during  these  intersals  we  are  left  with  the 
standard  Kalman  filtering  problem  o'  estimating  the  state 
of  the  linear  system  (la)  from  the  observations  (lb);  if  x, 
is  conditionally  Gaussian  at  the  beginning  of  each  such 
interval,  it  remains  so  throughout  with  mean  and  covari- 
ance which  evolve  according  to  (6)  and  (8)  or  (9)  with  the 
last  term  deleted  in  each.  (This,  of  course,  is  also  reflected 
in  the  equation  (11)  for  the  conditional  density  reducing 
to  Kushner’s  equation  during  these  intervals.)  We  now 
observe  that  x,  is,  in  fact,  conditionally  Gaussian  at  the 
beginning  of  each  such  interval  because  it  is  at  /=0  and 
because  it  remains  Gaussian  after  each  point  occurrence; 
indeed,  at  each  occurrence  (t,r)  of  the  space-time  point 
process  the  spatial  observation  r is  an  independent  ob- 
servation on  a Gaussian  random  variable  with  mean  Hx, 
and  covariance  /?;  this  is  equivalent  to  a discrete  observa- 
tion of  the  form 

r=H(t)x,  + ^ 

where  ^~~N{0,R)  is  independent  of  .v  and  z.  Thus,  from 
standard  estimation  theory  for  Gaussian  random  variables 
[e.g.,  10],  the  conditional  density  remains  Ganssian  and 
the  change  in  conditional  mean  and  covariance  c'f  x,  after 
accounting  for  this  new  observation  are 

dx,  = x,,,-x,  = I.,H'[H'S:H'-i-  Ry\r- Hx,) 

= M,{r-  Hx,) 
d'L,  = 2,.^  -2,= 

d2,-'  = 2,V-2,-'  = H'R-'H. 

Of  course,  this  term  is  to  be  included  only  when  an 
occurrence  takes  place  at  (t.r);  multiplying  each  of  these 
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calculation  that 

S(,)^£[(x,-x*)(x,-V)']  (17) 

satisfies  the  linear  matrix  differential  equation 

S = (F- LC)S  + S(F-  LCY+  yy+LL' 

+ iL{M[HSH'+  R]M- MHS-St.'M'), 

5(0)=cov[xo]  (18) 

where  we  have  suppressed  the  common  argument  t of  all 
entries,  and  where  /!(/)  = E ( (x,].  Because  all  coefficients  in 
(18)  are  uniformly  bounded,  a unique  solution  to  (18) 
exists  for  all  /£[0,  m).  We  thus  have  proved  Theorem  3. 

Theorem  3;  For  any  uniformly  bounded  A/(  ) and 
L(  ),  the  mean-square  performance  (17)  of  the  estimator 
(16)  satisfies  the  linear  matrix  differential  equation  (18), 
and  this  is  a matrix  ordering  upper  bound  on  ffS,],  i.e., 
for  all  f e[0,  oo), 

2(/)=£[2,]<S(/)  (19) 

in  the  sense  that  S(/)-2(/)  is  nonnegative  definite. 

We  now  show  that  there  exists  a choice  of  A/(  ) and 
£(•)  in  (16)  for  which  the  corresponding  mean-square 
performance  given  by  (18)  lies  at  all  times  below  that  for 
any  other  choice. 

Theorem  4:  Let  S*(-),  M*{  ),  and  L*{  ) satisfy 

s* = FS*  4- s* F' 4-  yy'-s*c’cs* 

-iiS*H’[HS*H'+R]~'HS*,  S*(0)  = cov[xo] 

(20) 

M*  = S*H'[HS*H'  + R]~',  L*  = S*C'  (21) 

where  the  common  argument  t of  all  entries  in  (20)  and 
(21)  is  suppressed.  Let  S(-)  be  the  solution  to  (18)  for 
some  arbitrary  A/(  ) and  L(  ).  Then,  for  all  /e[0,oo), 

£[2,]<S*(/)<S(/)  (22) 

and  £*(/)=  £'{(x,-x*)(x,-x*)'}  is  the  mean-square  per- 
formance of  the  bound-minimal  estimator 

dx;  = Fx; dt  + Gu,dt  + S*C'[d2,-  Cx* dt ] 

+ S*H'[HS*H'+R]~'  f [r-Hx*]N{dlXdr).  (23) 

Proof:  Completing  the  square  on  the  right  side  of 
(18)  yields 

S - £S  + VV  - sees  - iiSH'l  HSH+  R]~'hS 

+ (L-Se'){L-Se'y  + ji[M-SH'(HSH'+R)~'] 
■(HSH'+  R)[m- SH’(HSH’+  R)~'j\ 

S(0)-cov[xo].  (24) 


For  given  5(0,  the  right  side  of  (24)  is  clearly  minimized 
by  making  the  last  two  nonnegative  definite  terms  0,  in 
which  case  the  right  side  of  (24)  reduces  to  that  of  (20) 
while  the  minimizing  choices  of  M(t)  and  L(t)  are  given 
by  (21).  It  remains  to  show  that  this  instantaneous  order- 
ing on  the  time  derivative  produces  a permanent  ordering 
of  the  solutions  over  (0, 7"],  i.e.,  that  the  solution  to  (20) 
lies  at  all  times  below  that  of  (24)  in  the  matrix  ordering. 
This  is  readily  accomplished  by  using  [9,  lemma  1]  after 
appropriate  modifications  to  reflect  that  initial  conditions, 
rather  than  final  conditions,  are  of  interest  here.  This 
means  that  in  [9,  lemma  1]  the  left  sides  of  (*)  and  (**) 
should  be  replaced  by  + V and  + Y,  respectively,  and  all 
time  orderings  0 < t < r < T replaced  by  0<  s < i<  T. 
Then,  letting  (24)  play  the  role  of  (•)  and  (20)  the  role  of 
(*•)  and  checking  conditions  l)-4)  of  [9,  lemma  1]  we 
have:  1)  and  2)  are  trivial  under  our  standing  assump- 
tions; 3)  holds  because,  by  a subsidiary  application  of  [9, 
lemma  1],  the  solution  to  (20)  lies  at  all  times  above  that 
of  the  Riccati  equation 

f=£r-i-rF'-i-  yy'-T[e'e+riH'R-'H]T, 

r(0)  = cov[xo]  (25) 

for  which  3)  is  known  to  hold;  finally,  4)  holds  for  (18) 
and  therefore  (24)  because  if  5|(/)  and  S2(t)  are  the 
respective  solutions  to  (18)  with  initial  conditions  5,(0) 
and  52(0),5,(0)>  ^2(0),  then  Sft)>  S^it)  for  all  t,  since 

5,  - 52  = (f  - LC- /TA/W  )(5,  - 52 ) 

+ {S^-S2}(F-Le-{^^^H  Y 

+ iiMH(S^-S2)H'M,  (5|-52)(0)>0 

and  the  solution  to  this  lies  at  all  times  above  the  (identi- 
cally zero)  solution  to 

Y = {F-Le-jlMH)Y+  Y{F-Le-jiMHY-,  y(0)  = 0 

by  a further  subsidiary  application  of  [9,  lemma  1). 

Remark  5:  The  evaluation  of  the  performance  of  the 
suboptimum  estimator  (16)  is  a second-order  analysis  that 
uses  only  the  means  and  covariances  of  the  various  ran- 
dom variables  and  processes  and  makes  no  use  of  the 
Gaussianness  of  Xq,  c,  w,  and  r.  Thus,  the  results  of  this 
section  remain  valid  if  v and  w are  replaced  by  normal- 
ized uncorrelated-increment  processes  that  are  uncorre- 
lated with  each  other  and  with  Xq  and  r,  with_.XQ  having 
any  distribution  with  mean  Xq  and  covariance  2^  and  the 
spatial  intensity  y,(r,x,)  being  any  distribution  with  mean 
Hx  and  covariance  R such  that  r is  uncorrelated  with  Xq. 
The  bound-minimal  estimator  (23)  can  then  be  viewed  as 
the  best  estimator  in  the  family  (16).  These  estimators  are 
bilinear  because  of  the  product  rN{diXdr)  in  the  last 
term,  though  they  might  also  be  considered  to  be  in  a 
sense  linear,  to  the  extent  that  N{dtxdr)  merely  signals 
the  arrival  of  a spatial  observation  r which,  as  with  z.  is 
utilized  linearly  in  the  production  of  x. 
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expressions  by  N{dtxdr)  and  integrating  over  R"”  takes 
care  of  this,  and  constitutes  the  last  term  in  (6),  (7)  and 
(9),  respectively. 

Remark  3:  It  was  observed  in  the  proof  that  p,(x|‘d8,  V 
'.'111)  does  not  depend  on  ‘3IL,  and  thus  x,  and  p are 
conditionally  independent  given  ^8,;  in  particular,  x,  and 
fi,  are  conditionally  independent  given  and  thus  the 
joint  problem  of  estimating  x,  and  n,  given  separates 
into  two  separate  problems  of  estimating  x,  and  estimat- 
ing M,: 

Furthermore,  the  final  part  of  Theorem  1 establishes  that 
depends  only  on  the  Poisson  counting 
process  N and  does  not  depend  on  z or  the  spatial 
locations  of  the  points.  This  latter  estimation  problem 
with  various  models  for  is  examined,  for  example,  in  (6). 

Theorem  2:  The  unique  admissible  control  i/°  that 
minimizes  the  cost  (5)  is  given  by 

u^^-P-'(t)G'{t)K{t)x,^  -L{t)x,  (13) 

where  x,  = E[x,\%,\  satisfies  the  finite-dimensional  nonlin- 
ear stochastic  differential  equation  (6)  with  2,  given  by 
(7),  and  the  nXn  symmetric  nonnegative  definite  matrix 
K(t)  satisfies  the  Riccati  equation 

K {t)=  - K{t)F(l)-  F\t)K(t)3-  K{t)G{t)P-^  {t) 

■G'(l)K{t)-Q(tl  K{T)=S.  (14) 

The  corresponding  minimum  value  of  J is 

y[«°]  = £{xi/r(0)xo} 

+ j^u[KGP-'G'KE  {'!.,)■¥  KVV']dt,  (15) 
where  tr  denotes  trace. 

Proof:  According  to  Astrom  [8],  J[u]  can  be  rewrit- 
ten as 

•^[m]  = E { Hu, + T(t)x,l|%„)}  dr  + [right  side  of  (15)] 

where  x,  = £[x,|®,]  and  \\y\\\=y' Py-  The  first  term  on 
the  right  side  is  nonnegative,  and  zero  if  and  only  if 
M,  = — £(/)x,.  Thus,  (13)  gives  the  unique  optimum  control 
provided  the  right  side  of  (15)  is  invariant  under  changes 
in  u.  The  only  way  for  (15)  to  be  u-dependent  is  through 
£ {2,},  and  (7)  shows  that  the  only  possibility  for  2,  to 
vary  with  u is  via  N,.  But  N,  is  a Poisson  counting  process 
with  rate  g,,  and  both  N,  and  n,  are  specified  at  the  outset 
as  mappings  on  (Q,‘^,P)  without  any  reference  to  u. 
Hence  2,  and,  therefore,  £ {2,}  and  the  right  side  of  (15) 
are  invariant  under  changes  in  u,  and  the  proof  is  com- 
plete. 

Remark  4:  Theorem  2 shows  that  the  solution  to  this 
stochastic  control  problem  can  be  realized  with  a sep- 
arated estimator-controller  in  which  the  estimator  is  non- 
linear, mean-square  optimal,  and  finite-dimensional  and 
the  controller  is  the  certainty-equivalent  linear  control  law 
(i.e.,  the  optimum  linear  control  law  for  the  deterministic 


problem  in  which  Xo=£(xo],  t),=  £[t),l=0,  and  x,  is 
known  exactly).  This  result,  therefore,  includes  as  special 
cases  the  familiar  linear-quadratic-Gaussian  “separation 
theorem”  (where  the  space-time  point-process  observa- 
tions are  absent)  (e.g.,  (8))  and  the  similar  results  in  (IH^l 
for  restricted  versions  of  the  space-time  point-process 
observations. 

We  observe  that  2,,  the  conditional  covariance  of  x, 
given  is  not  precomputable  because  the  last  term  on 
the  right  side  of  (7)  depends  on  the  particular  realization 
of  the  counting  process  N,.  One  is,  therefore,  led  to 
consider  2(r)  = £[2,],  both  as  a natural  measure  of  esti- 
mation performance  in  its  own  right  and  as  the  particular 
measure  of  estimation  performance  that  determines  the 
optimum  control  cost  (15).  However,  while  2(/)  is  de- 
terministic and  in  principle  can  be  precalculated,  this 
calculation  is  infinite-dimensional.  One  way^of  seeing  this 
is  to  observe  that  an  attempt  to  calculate  2(r)  by  taking 
expectations  of  both  sides  of  (7)  is  complicated  by  the  last 
term  on  the  right  side,  which  requires  the  calculation  of 
£ {2H'[7f  2W'-t-  /?]~'//2}.  While  a differential  equation 
for  this  can  be  written  down,  it,  in  turn,  requires  expecta- 
tions of  additional  nonlinear  functions  of  2,,  and  so  on  ad 
infinitum  in  a mushrooming  requirement  for  additional 
terms  that  is  familiar  from  other  nonlinear  filtering  situa- 
tions. Accordingly,  we  turn  our  attention  to  deriving 
easily-computed  upper  and  lower  bounds  on  2(r).  These 
estimation  bounds  then  directly  imply  upper  and  lower 
bounds  on  the  optimum  control  performance  (15) 

IV.  Suboptimum  Estimators  and  Upper  Bounds 

Our  approach  to  finding  easily-computed  upper  bounds 
on  £[2,)  is  to  examine  a parametrized  family  of  subopti- 
mum estimators  whose  mean-square  performance  can 
easily  be  calculated  exactly.  For  each  suboptimum  estima- 
tor the  corresponding  mean-square  error  is  then  trivially  a 
matrix-ordering  upper  bound  on  £[2,].  Furthermore,  we 
show  that  there  exists  a suboptimum  estimator  in  this 
family  whose  mean-square  performance  is  at  all  times 
smaller  than  that  of  any  other,  thus  providing  a minimal 
upper  bound  within  this  class. 

Motivated  by  the  form  of  the  optimum  estimator  (6). 
we  consider  the  family  of  suboptimum  estimators 
dx*  = £( /)x,* dr  + G ( t)u, dr  + £( r) [ dz, - C ( r)x,* dt ] 

■y  j M{t)[r~H{t)x*]N(dtXdr)  (16) 

parametrized  by  the  deterministic  uniformly-bounded  n x 
m-  and  n X ly-matrix  valued  time  functions  £(•)  and  M(  ). 
This  family  does  not  include  the  optimum  estimator  (6)  in 
which  A/,,  L, » 2,C'  are  random  matrices  depending  on  N 
through  2.  Apart  from  the  requirement  that  ) be 

deterministic,  the  suboptimum  estimator  (16)  and  the  opti- 
mum estimator  (6)  share  the  same  structure  The  nonran- 
domness of  M,  N enables  us  to  write  down  an  ordinary 
nXm-matrix  differential  equation  for  the  mean-square 
error  of  the  suboptimum  estimator  (16).  Indeed,  subtract- 
ing (16)  from  (la),  it  follows  directly  by  straightforward 
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V.  Estimation  Lower  Bounds  and  Control 
Bounds 

Theorem  5:  Let  5*  be  the  solution  to  (25).  Then  for  all 
/>0,  5,(/)  is  a matrix-ordering  lower  bound  on  £[2,], 


£■  { Ai/:(0)Ao}  + [ /(GP  - 'C' PATS'*  -I-  Kyy']di 

<J[u°]<E{x^K(0)x^} 

+ j^tT[KGP-'G'KS*  + Kyy']dt 


S.(0<-£[2,].  (26) 

where  S,  is  the  solution  to  (25)  and  S*  is  the  solution  to 

Proof;  We  have  from  (9)  that  2“'  = £'[2“']  satisfies  (20). 


^--^P-r^-£[2-'KK'2“']-t-C'C 
+ iiH'R-'n-,  F^(0)  = (cov[xo])'' 

= - ^ F-  F'^  - 2^  Kr  2^  -E  C'C 
+ iiH'R-'H-A  (2 


A=£[2-'KK'2“']-  2"'  f'r  2-'  =cov[2-'l/]  >0. 

(28) 

It  then  follows  from  [9,  lemma  1]  that  2"'  lies  at  all  times 
below  the  solution  to 

Z=  -EF-F'Z-iyy’Z  + iiH'R-'H+C'C 

i(0)  = (cov[xo])''.  (29) 


Z(0>2-'(r)^£[2,-']>(£2,)-'.  (30) 

the  last  inequality  being  a matrix  version  of  Jensen’s 
inequality  proved  in  the  Appendix.  Taking  inverses  of  (30) 
and  noting  that  if  S,  is  the  solution  to  (25),  then  St  ' is 
the  solution  to  (29),  we  have  the  desired  result  (26). 

We  remark  in  passing  that  (25)  gives  the  covariance  of 
the  optimum  estimator  when  the  space  time  point-process 
observations  are  replaced  by  continuous  observations  of 
the  form 

dy,=  Hx,dt  + { iL~'R 

where  n,  is  a Wiener  process  independent  of  Xg,  u,  and  w. 

Remark  6:  Comparing  (20)  for  the  minimal  upper 
bound  with  (25)  for  the  lower  bound,  we  see  that  these 
two  bounds  will  be  close  to  each  other  and  thus  to  £(2,] 
if  HS*H'  is  small  compared  with  R (or,  equivalently,  if 
H'R~'H  is  small  compared  with  S*).  Both  bounds  will 
also  be  close  to  each  other  and  to  the  optimum  perfor- 
mance if  the  mean  intensity  fi  is  small.  These  are  dis- 
cussed lat'*r  in  terms  of  our  motivating  example. 

Once  we  have  deduced  upper  and  lower  bounds  on  the 
estimation  performance  £[2,),  corresponding  bounds  on 
the  optimum  control  performance  follow  directly  by  sub- 
stitution of  these  bounds  for  £(2,]  in  (15). 

Theorem  6:  Upper  and  lower  bounds  on  the  optimum 
control  performance  i[«®)  of  (15)  are 


VI.  Discussion 

The  above  estimator-controller  solution  extends  results 
in  [2]  and  [3]  to  include  a more  general  form  of  observa- 
tion. Just  as  with  the  obsetvation  model  in  [2]  and  [3],  this 
more  general  observation  is  motivated  by  communication 
systems  that  employ  a narrow  beam  of  light  as  a carrier, 
by  star  tracking  systems,  and  by  infrared  tracking  systems, 
all  of  which  have  a requirement  for  position  sensing  and 
active  tracking  to  maintain  optical  alignment  in  the  pres- 
ence of  a variety  of  disturbances.  We  shall  indicate  how 
the  models  of  [2,  sec.  4]  and  [3,  sec.  4]  are  usefully 
extended  by  this  more  general  observation.  The 
estimator-controller  solution  of  Theorem  2 provides  a 
possible  tool  for  the  design  of  an  optical  tracking  system 
under  the  conditions  indicated  below,  and  the  perfor- 
mance bounds  of  Sections  IV  and  V provide  the  means 
for  predicting  the  performance  of  such  designs. 

Let  /(f,r)  denote  the  light  intensity  at  time  /e[0, oo) 
and  position  of  an  optical  field  incident  on  the 

photoemissive  surface  of  a two-dimensional  photodetector 
on  boresight  and  without  any  motions.  Here,  ^ is  a 
subregion  of  R^  corresponding  to  the  photoemissive 
surface.  We  assume  a Gaussian  intensity-profile 

2 ('.0  = 2o(f)exp{  - ir'A  (/)r  }. 

Vibration,  beam  steering  due  to  propagation  of  the  light 
beam  through  atmospheric  turbulence,  and  other  effects 
cause  the  spot  of  light  on  the  photoemissive  surface  to 
move  about  in  a random  fashion  and  to  fluctuate  ran- 
domly in  optical  intensity.  In  this  case,  the  intensity 
profile  becomes 

2('.^>'m(0)  = 2o(Oexp{  -i[r->'„(r)]' 

•£‘'(0[r-y*.(0]} 

where  y„(t)  models  the  random  motions,  and  I^t)  is  a 
random  process  (e.g.,  a lognormal  process)  that  models 
random  intensity  fluctuations.  We  assume  that  (y^ft):!  > 
0}  is  derived  from  a Gaussian  diffusion  satisfying 

F„{t)x„(t)dfy  y„{{)dv„{l), 
H„(t)x„{t) 

where  [v„{t)\t>0)  is  a standard  Wiener  process.  The 
fading  process  {/o(t);/>0}  is  assumed  to  be  independent 
of  motion  processes  but  is  otherwise  arbitrary.  The  pur- 
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pose  of  the  tracking  controller  is  to  compensate  for  these 
random  motions  and  random  fading  in  order  to  maintain 
optical  alignment.  Thus,  in  the  presence  of  a controller  to 
position  telescopes,  mirrors,  or  other  pointing  devices,  the 
intensity  becomes 

where  (/)->'„(/)  is  the  tracking  error.  Ideally,  this  error 
should  be  zero,  but  this  cannot  be  accomplished  for  two 
reasons:  the  position  error  y„(f)  is  unknown  and  must  be 
estimated  from  data  available  at  the  photodetector  output, 
and  the  tracking  devices  will  have  some  inertia  so  that 
y„{i)  cannot  be  tracked  instantaneously  even  if  it  were 
known.  We  model  the  tracking  devices  by  a linear 
stochastic  plant 

yp(i)= 

where  «(/)  is  the  input  to  the  tracking  devices  from  the 
tracking  controller,  and  { / > 0}  is  a standard  Wiener 
process  modeling  local  disturbances  such  as  those  due  to 
vibration. 

Photoelectron  conversions  take  place  in  the  photoemis- 
sive  surface  at  a rate  proportional  to  the  incident  light 
intensity  [3].  Thus,  the  photoelectron  conversion  rate  has 
the  form  of  for  (r,r)6[0,QQ)x^  with  g,  an 

appropriately  scaled  version  of  and  x is  the  vector 
obtained  by  adjoining  x„  and  x^,  and  H is  obtained  from 
H„  and  H^,  in  an  obvious  way. 

The  problem  of  optical  tracking  is  to  follow  the  position 
of  maximum  light  intensity  at  time  t in  terms  of  both 
photoelectron  conversions  observed  on  [0,  r)  X ^ and  ob- 
servations of  the  plant  state  x^  obtained  with  sensors 
located  at  the  tracking  devices.  These  latter  observations 
are  modeled  according  to  (lb)  so  as  to  account  for  sensor 
noise.  Except  for  the  finiteness  of  this  problem  is 
identical  to  control  problem  studied  above  when  photo- 
electron conversions  are  identified  as  space-time  points. 
An  approximation  that  appears  reasonable  when  the 
beam  is  small  and  the  tracking  errors  are  small  (i.e.,  fine 
tracking  mode  rather  than  an  acquisition  mode)  compared 
to  the  size  of  the  photoemissive  surface  is  to  replace  by 
R^.  With  this  approximation,  the  optical  tracking  problem 
is  solved  by  the  result  in  Theorem  2. 

It  is  important  to  note  that  according  to  Remark  3 and 
Theorem  2,  the  design  of  the  tracking  controller  does  not 
depend  in  any  way  upon  the  source  or  nature  of  random- 
ness in  l^t).  Thus,  for  example,  the  same  design  is  ob- 
tained if  Io(t)  is  random  due  to  atmospheric  turbulence  or 
modulation  by  an  information-bearing  signal  or  a combi- 
nation of  these. 

The  upper  and  lower  bounds  of  Sections  IV  and  V 
provide  a measure  of  the  performance  for  the  optical 
position-sensing  and  tracking  system  derived  from  Theo- 
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rem  2.  From  Remark  6,  the  upper  and  lower  bounds 
merge  when  HS*H'  is  small  compared  to  the  beam 
spread  as  measured  by  R.  It  is  evident  that  the  estimation 
and  control  lower  bounds  derived  as  above  for  observa- 
tions of  each  photoelectron  conversion  are  also  lower 
bounds  for  both  optimal  and  suboptimal  trackers  that 
employ  observations  obtained  by  temporal  or  spatial 
averaging  as  would  be  obtained  using  photon  counting 
and  a quadrant  photomultiplier. 

We  mention  also  that  Segall  in  [II]  has  applied  the 
models  of  [1]  and  [3]  to  study  computer  communication 
networks.  The  upper  and  lower  bounds  on  performance 
that  we  have  derived  can  be  applied  in  this  context  as 
well. 

Appendix 

Lemma:  Let  and  If'j  be  positive  definite  matrices, 
and  let  y e[0,  Ij.  Then 

[yM',-E(l-y)fFj]''<yH^,-'-t-(l-y)lVf',  (Al) 

i.e.,  is  convex  in  a matrix  sense.  Furthermore,  we 
have 

E[W-']>(E[W])''  (A2) 

(cf.  Jensen’s  inequality). 

Proof:  Let  x = [Ylf'| -E(l  - y)lt'2]>’.  Then 

x' [ y IFf  ' -(- ( 1 - y)  If'j- ' ]x  = 7 V' _v -E  2y^ ( 1 - y )y' Wjy 

+ y(\-yfy'W^Wc'iV^y 

■^(\-y)yyw^W{'W,y 

-E2y(l-y)VH^,>’ 

+ (1-Y)V»^2J'. 

Now  »F2^r'>2fF2-  IF,  because  (If'2- 

fF,)>0;  similarly,  Substituting 

these  inequalities  and  simplifying  yields 

x'[  y If',- ' -E ( 1 - y)  IJ'2' ’ > > y [ (y  + I - y)'] / If', 

-E(l-y)[(l-y-Ey)']/lf'2>. 

= yy'W^yy(\-y)y'W^y 

= x'[ylf',-E(l-y)lf'2]  '-x- 

From  this  (A2)  follows. 
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ABSTRACT 

The  cutoff  rate  is  derived  for  a digital  communication  system  employing 
an  optical  carrier  and  direct  detection.  The  coordinated  design  of  the  encoder, 
optical  modulator,  and  demodulator  is  then  studied  using  the  cutoff  rate  as  a 
performance  measure  rather  than  the  more  commonly  employed  error  probability. 
Modulator  design  is  studied  when  transmitted  optical  signals  are  subject  simul- 
taneously to  average  energy  and  peak  value  constraints.  Pulse-position  modula- 
tion is  shown  to  maximize  the  cutoff  rate  when  the  average  energy  constraint 
predominates,  and  the  best  signals  when  the  peak-value  constraint  predominates 
are  identified  in  terms  of  Hadamard  matrices.  A time-sharing  of  these  signals 
maximizes  the  cutoff  rate  when  neither  constraint  dominates  the  other.  Prob- 
lems of  efficient  energy  utilization,  choice  of  input  and  output  alphabet  dimen- 
sion, and  the  effect  of  random  detector  gain  are  addressed. 
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I.  INTRODUCTION 


Our  concern  in  this  paper  is  with  digital  communication  systems  that  • ‘ 

employ  coherent  light  as  a carrier  and  direct  detection  as  the  means  to  convert 
the  received  optical  field  into  an  electrical  signal  for  subsequent  processing. 
Communication  systems  of  this  type  are  discussed  widely  in  the  literature  (see 
[l]-(5]  and  references  therein)  and  are  of  increasing  importance  in  applications. 

The  optical  portion  of  the  overall  system  consists  of  the  optical  modulator, 

»■ 

optical  channel,  and  optical  detector  shown  schematically  in  the  basic  information- 
theoretic  model  of  the  optical,  digital-communication  system  of  Fig.  1.  Here, 

E(t,r)  represents  the  temporally  and  spatially  dependent  complex  envelope  of  the 
optical  field,  and  N(t)  represents  the  counting  process  associated  with  the 
output  of  an  ideal,  direct-detection  device.  This  counting  process  is  assumed 
to  be  an  inhomogeneous  Poisson  process  with  rate  function  X(t)  = s(t)  + X^, 
where  X^  represents  the  contribution  to  the  total  count  rate  due  to  dark  current 
in  the  detector.  Also,  X^  can  account  for  background  radiation  when  this  is  char- 
acterized by  many,  weak  modal-components  [2,3].  The  assumption  that  N(t)  is  a Poiss- i 
t'rocess  is  met  to  a close  approximation  on  the  f ree-space  channel  for  coherent  sources 


[3].  On  our  model,  the  signal  count-rate  s(t)  is  related  to  E(t,r)  according  to 
s(t)  - (n/hv)^  |E(t,r)I^dr,  (1) 


2,-> 
,r)  dr. 


where  n is  the  quantum  efficiency  of  the  detector,  h is  Planck's  constant,  v 
is  the  optical-carrier  frequency,  and  A is  the  active  surface  of  the  detector; 
it  is  evident  that  s(t)  is  nonnegative,  which,  of  course,  it  must  be  as  a rate 
function. 

We  shall  suppose  that  a code  letter  x in  Fig.  1 is  drawn  once  each  T 

seconds  from  a q-ary  alphabet  ^ =•  ...,X^}.  We  further  suppose  that 

each  demodulator-output  letter  y is  drawn  from  a q'-ary  alphabet  V =*  {Y, ,Y- Y , 

12  q 

where  in  general  q' 2.  Initially,  we  investigate  "infinitely  soft"  decisions 


q' . The  decoder  output  letters  u supplied  to  the  sink  are  reproductions  of  the 


encoder  input  letters  u supplied  by  the  source;  these  are  presumed  to  be  drawn  j 

from  a binary  alphabet  U = {0,1}.  The  rate  of  the  coding  system  in  terms  of 

the  number  of  source  digits  for  each  channel  letter  will  be  denoted  by  R bits 

per  channel  use.  This  means  that  R =*  R T if  the  source  generates  R bits  per 

s s 

second. 

The  combination  of  the  optical  modulator,  optical  channel,  optical  detector, 
and  demodulator  forms  a discrete  channel  with  a q-ary  input  alphabet  X and  q'-ary 
output  alphabet  By  virtue  of  the  independent-increments  property  of  the 

I 

Poisson  process  and  the  constancy  of  this  is  a "constant,  discrete,  memory- 
less channel"  in  the  sense  that  the  conditional  probability  the  channel  output 


sequence  is  b-b-...b  , where  each  b is  in  V,  given  that  the  input  sequence  is 
1 ^ n 1 

a, a-... a , where  each  a.  is  in  X,  factors  into  the  n-fold  product  of  the  per- 
1 Z n 1 

letter  transition  probabilities  according  to 


•TT Pr[y  = b Jx  = a ]. 
i=l 


Furthermore,  the  per- letter  transition  probabilities  are  the  same  for  any  T 
second  use  of  the  channel.  Thus,  if  p i (yIx)  denotes  the  per  letter  transition 


probability,  the  right  side  of  (2)  is  | | p i (b, |a  ).  The  design  of  the 

i=i  yi*  1 1 

modulator  and  demodulator,  of  course,  affects  p i (y|x).  We  shall  study  the 

y I * 

design  which  makes  p t (Yjx)  most  favorable  for  a given  optical  channel  and 

y [X 

detector.  The  coordination  of  this  design  with  that  of  the  encoder  will  also 


be  studied 


A quantity  that  reflects  the  influence  of  p i (Y|x)  on  the  quality  of 

y 1 ^ 

a constant  discrete  memoryless  channel  is  the  cutoff  rate  Rq  defined  by 


q q 


where  Q is  a probability  mass-function  on  X.  Wozencraft  and  Kennedy  [6],  in 
1966,  were  first  to  argue  in  favor  of  the  cutoff  rate  as  a criterion  for  design 
because  it  is  the  upper  limit  of  code  rates  R for  which  the  average  decoding 
computation  per  source  digit  is  finite  when  sequential  decoding  is  used. 

Wozencraft  and  Kennedy  also  showed  that  there  is  a block  code  of  rate  R and  codeword 
length  N such  that  the  probability  of  error  Pr(e)  in  decoding  a sourceword  of 
length  K = NR  is  bounded  according  to 

Pr(e)  if  R < Rq.  (4) 

Thus,  for  block  codes,  the  single  number  R^  provides  a measure  of  both  a range 
of  rates  R for  which  reliable  communication  is  possible  as  well  as  the  coding 
complexity,  as  reflected  by  N,  required  to  guarantee  a specified  block-error 
probability.  More  recently,  Viterbi  [7]  has  shown  for  convolutional  coding 
and  maximum- likelihood  sequence  decoding  on  the  constant,  discrete  memoryless 
channel  that  the  error  probability  is  upper  bounded  according  to 


Pr(e)  1 L 2 if  R < R^,  (5) 

where  N is  the  constraint  length  of  the  convolutional  code,  R is  the  code  rate, 
L is  the  total  number  of  source  letters  encoded,  and  C is  a weakly  dependent 
function  of  R and  not  a function  of  L and  N.  Thus,  as  with  block  codes,  the 
single  number  R^  provides  a measure  of  both  reliable  rates  and  code  complexity. 


Massey  [8,9]  made  these  observations  first  and  has  used  them  to  make  an  eloquent 


and  persuasive  argument  for  adopting  as  a modulator-demodulator  design  parameter 
in  place  of  the  more  commonly  used  error  probability.  In  what  follows,  we  shall 
investigate  some  of  the  implications  of  attempting  to  maximize  this  parameter 
for  modulator-demodulator  design  for  direct-detection,  optical  communication 
systems. 

II.  Rq  for. Infinitely  Fine  Quantization 

In  practice,  the  demodulator  of  Fig.  1 must  quantize  the  point  process 
observed  on  [0,T]  in  some  fashion  to  produce  one  of  the  q’  output  letters  in  / . 

This  might  be  accomplished. for  example,  by  counting  points  in  subintervals  of 
[0,T],  disregarding  their  times  of  occurrence  within  these  subintervals,  and 
then  comparing  the  subinterval  counts  to  prescribed  thresholds.  Regardless  of  what 
form  of  quantization  is  adopted,  the  finer  it  is,  the  larger  will  be  the  cutoff 
rate  R^  of  the  resulting  constant  discrete  memoryless  channel.  Thus,  we  consider 
first  the  limiting  situation  of  infinitely  fine  quantization,  for  which  q'  “ * 
and  R = R is  not  degraded  by  quantization.  Then,  we  consider  the  effect 

0 0 j®® 

of  finite  quantization. 

For  a Poisson  process  with  rate  X,(t),  the  probability  of  observing  n points 

during  [0,T]  in  n disjoint  intervals  [t  , t +At, ) , [t,, t„+At») , . . . , [t  ,t  +At  ) 

is  approximated  to  o (max . At , ) by 

i 1 


n -T 

X(t^))exp(-J  X(t)dt)  At^At^.-.At^ 


for  n^l  and  by 

exp  (-/"  X(t)dt) 

•'o 

for  n =•  0.  Consequently,  for  infinitely  fine  quantization,  the  summation. 


-5- 


call  it  over  k in  (3)  becomes 


f (i,j)  = expC-k y'a^(t)+Xj(t))<it)  [1+2  //*•  • -/jT 


where  Xj^(t)  and  Xj  C^)  are  the  detection ‘rates  for  code  letters  and  X^ , 


respectively,  and  the  integration  is  over  the  region  0<t,<t„< <t  <T.  By 

— i~  I — — nr- 

extending  this  range  of  integration  to  O^t^j^T  for  i = 1,  2,...,n,  and  dividing 
by  n!  to  compensate  for  this  extension,  we  obtain 


f(i,j)  = expC 


gj(t))  dt). 


where  we  define  g^(t)  = X^Ct)  and  Sj (t)  = Xj(t).  Thus, 


q q 

„ = y'y'Q(X.)Q(X  )exp(-%/  (g,(t)-g.(t))2dt)}.  (6) 

’ 2 Q i 3 ^ 1 3 


This  expression  is  identical  to  that  obtained  by  Massey  [8,  eq.(4)]  if 


the  signal  g^(t)  were  to  be  observed  in  an  additive  white  Gaussian  noise  of 


unit  intensity  when  is  the  code  letter  into  the  modulator.  It  is  with  this 


expression  that  Massey  established  for  the  first  time  the  Rq  ^-optimality  under  an 


average  energy  constraint  of  a simplex  signal  set  for  the  additive  white  Gaussian 


noise  channel.  However,  the  additional  constraint  g^(t)^X^0  obtains  here,  so 
Massey’s  argument  does  not  hold  for  direct-detection  optical-communication 
systems  and  must  be  modified.  This  is  accomplished  as  follows. 


By  defining 

q 

S 


i=l 


and  by  using  Jensen's  inequality,  Massey  [8  ] shows  from  (6)  that 

q q -T 


Rq„<  - log2{”J” 


(MM  ** 

- Z 12  Q(x^)Q(Xj)  j (gj^(t)-gj(t))^dLy]) 


(7) 
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with  equality  holding  if  and  only  if  the  quantity 


L\ 


(t)  - gj(t))^dt 


(8) 


is  the  same  whenever  i?^ j . It  is  evident  from  (7)  that  R.  is  a monotonically 

0,“ 


increasing  function  of  d^^  for  i?^j . Thus,  if  d denotes  the  maximum  of 


the  d^j  for  1?* j , there  holds 


Rq  « < + (l-S)exp(-%  d^)]}. 


(9) 


Furthermore,  it  is  easily  verified  that  the  minimizing  code  letter  distri- 
bution in  (9)  is  the  uniform  distribution  Q(X^)  = 1/q  for  i=l,2,...,q. 

As  S = 1/q  for  this  distribution,  (9)  becomes 


(q-l)exp(-%  d )] 


(10) 


with  equality  holding  if  and  only  if  d^^  = d whenever  i^^j . 


III.  Modulator  Design  Based  On  Rg  ^ 


An  optical  modulator  designed  to  produce  a signal  set  S = {Ej^(t,r), 


E^Ct,  r) , . . . ,E^(t,r) } such  that  d^^  = d for  i?*j  and  such  that  d is  as  large 
as  possible  produces  the  best  overall  performance  for  the  digital  optical 


concnunication  system  as  measured  by  R„  . Thus,  we  are  motivated  to  examine 

^ 0,“  ’ 


the  maximization  of  d subject  to  suitable  constraints  on  signals  in  S . 

Associated  with  each  signal  set  S is  a derived  signal  set  <5  = {gj^(t)  ,g2(t)  , . . . , 

= 


gq(t)}  in  which  g^(t)  = X^(t),  where 


(11) 


Aj^(t)  = (n/hv)  J|E^(t,r)  1^  dr  + Aq- 

» A 

k 

Note  that  signals  in  G satisfy  g.(t)  >g  . = X^f. 

1 — mxn  0 

This  maximization  problem  is  examined  subject  to  additional  constraints 
on  the  average  energy  and  the  peak  amplitude  of  signals  in  the  transmitted 
signal  set  S.  We  assxime  that  the  average  energy  E of  signals  in  S,  defined 
by 


(12) 


must  satisfy 


E < E 
— max. 


(13) 


where  E^^^  is  a prespecified  maximum  allowable  average  energy.  Then  the 
average  energy  E^  for  signals  in  the  derived  signal  set  G,  defined  by 


^ JT 

^ ^ i=»l  0 


(t)dt. 


(14) 


satisfies 


E - n =*  s < s , 
g — max 


(15) 


where  s = nE/hv  and  n = g^.  T = X„T  are  the  average  number  of  signal 

min  0 

counts  and  noise  counts,  respectively,  per  channel  use,  and  where 


S = nE  /hv.  We  assume,  further,  that  the  amplitude  |e. (t,r)|  of  each 
iqaX  x ' 

signal  In  the  transmltted-slgnal  set  S cannot  exceed  a prespecified  maxi- 
mum value  P ; that  Is, 
max 


for  1=1, 2,..., q,  0^t<^T,  and  for  all  locations  r In  the  active  surface  of 
the  detector.  Then  each  signal  In  the  derived  signal  set  6 satisfies 


where  g = and  g = [(nA/hv)p  + X^] 
min  0 max  max  0 


For  modulator  design,  we  thus  have  the  following  optimization  problem: 
select  signals  In  G to  maximize 


q q JT 


[q(q-l)r^  ^ 2 f 

i=l  j=i  Jq 


subject  to  the  following  constraints: 


(1)  equidistance  constraint:  the  quantities  d^^  in  (8)  should  be 
the  same  whenever  ± ^ j. 

(11)  average- energy  constraint:  equation  (15)  should  be  satisfied. 

(ill)  peak-amplitude  constraint:  equation  (17)  should  be  satisfied. 

To  simplify  the  development,  we  temporarily  neglect  the  equidistance  con- 
straint in  formulating  and  solving  this  optimization  problem.  It  will  be  evi- 
dent subsequently  that  among  the  solutions  to  the  relaxed  problem  are  ones 
satisfying  the  equidistance  constraint,  and  these  are  then  solutions  to  the 
fully  constrained  problem. 


We  find  for  q' 


B CO 


that  the  best  choice  of  modulator  design  depends  on 


H 


the  particular  values  of  q,T,g  , ,g  , and  s , but  whatever  values  these 

min  max  max 

parameters  may  be,  there  are  only  three  categories  of  best  design.  These 
are  determined  by  the  conditions 


s 

max 


G[0,  -|(g^ 


max 


(19a) 


s C 
max 

f.l,  2 2 _ 1.  2 2 

^q^®max  "•  2^®max“®min^^^ ’ 

q even 

"r 

q odd 

1 

1 

f 

1 

7 c 

max 

i 

1,1,  2 2 , 
f2«W8„l„)’^-->- 

q even 

1 

[.Sti  (g^  _g^  )X  ») 

4q  ^®max  ®min^  ’ ** 

q odd. 

(19b) 


(19c) 


We  say  that  the  "average  energy  constraint  predominates"  when  (19a)  holds, 
the  "peak-amplitude  constraint  predominates"  when  (19c)  holds,  and  that 
"neither  constraint  predominates"  when  (19b)  holds. 


Average  Energy  Constraint  Predominates.  We  first  give  an  upper  bound  on 
d that  holds  regardless  of  which,  if  any,  constraint  predominates.  Then 
we  identify  a modulator  design  that  achieves  this  upper  bound  when  the 
average-energy  constraint  predominates. 

Suppressing  the  common  argument  t of  all  entities,  we  have 


the  inequality  holding  because  (g.-g  . ) > 0 for  all  iG{l,2 q}, 

i min  " 


Furthermore,  for  ifj,  equality  holds  if  and  only  if  at  most  one  of  and 

g.  is  strictly  greater  than  g . . Summing  over  i^* j , then  over  j C{1,2,. 
j min 

integrating  over  [0,T],  and  dividing  both  sides  by  q(q-l)  yields 

0 i=l  j-1  0 ^ 

with  equality  holding  if  and  only  if,  for  almost  all  t,  g^(t)  > g^^.^  for 
at  most  one  value  of  1 in  {l,2,...,q}  . Now  for  any  IG  {l,2,...,q}  an 


any  tG[0,T], 


[g . -g . (t)  ] > 0, 

1 min  nmx  l 


which  yields 


with  equality  if  and  only  if  g.(t)  = g , or  g.(t)  = g„,^.  We  then  have 

1 min  1 djax 


'’max  °mln 


< . +r  ^°»ax3>inl  2^^)  + g^ 

— g +g  . ^ ^max®min^  1 g +S  ^i^ 

®max  min  I-  max  mm  -• 


Itl  J 


[•  Warnin']  [g;(t)-gij. 


where  the  inequality  follows  by  virtue  of  (21) , and  equality  holds  if  and 


only  if  g^(t)  * or  g^(t)  = g^^^.  Finally,  substituting  (22)  into 

(20),  using  the  average  energy  constraint  (15),  and  noting  the  conditions 
for  equality  yields  the  following  lemma. 


Lemma  1.  Given  s^^^  as  the  maximum  average  signal  counts  per  channel  use, 
and  g^l8iCt)£g^^  for  tC[0,T]  and  iC  0,2,...,q  , then 


.1- 


^max 

®max^n 


Furthermore,  equality  holds  if  and  only  if  both:  (a),  at  any  time 

tC[0,T],  all  signals  in  G take  on  value  g^^^  except  at  most  one  which  takes 

on  value  g ; and  (b), 
max 


<1  T 

5E  /•; 

1*1  *^0 


(t)dt  - g . T = s . 

min  max 


Here,  condition  (a)  for  equality  is  simply  a combination  of  the  conditions 
for  equality  of  (20)  and  (22),  while  condition  (b)  is  the  condition  for 
equality  in  (15). 

A signal  set  that  is  equidistant  and  achieves  the  upper  bound  in 
Lemma  1 with  equality,  and  which  therefore  maximizes  the  cutoff  rate 

0,“ 

when  the  average  energy  constraint  predominates,  is  characterized  in  the 
following  lemma. 

Lemma  2.  If  s satisfies  (19a),  equality  is  achieved  in  (23)  by  the 
equidistant  pulse-position  modulation  (PPM)  signal  set 


^max* 


gj(t) 


(i-l)T/q<  t < (i-l+E)T/q 


Otherwise  for  0<  t<T, 


where 


^max 

^^®max  ®min^ 


To  establish  Lemma  2,  note  that  the  PPM  signal  set  Is  clearly  equi- 
distant and  that  condition  (19a) is  equivalent  to  e^l  so  that  condition 
(a)  in  Lemma  1 is  satisfied.  Moreover,  the  average  number  of  signal  counts 
per  channel  use  for  the  PPM  signal  set  is  given  by 


i 2 / W8„in>  • 


i=l  0 


Hence,  from  (26),  "s*  = s , so  condition  (b)  of  Lemma  1 is  also  satisfied. 

max 

2 

Consequently,  by  Lemma  1,  d for  this  signal  set  equals  the  upper  bound  in 

(23) . Also,  it  is  straightforward  to  verify  by  direct  calculation  for 

2 

this  PPM  signal  set  and  c^l  that  d equals  the  upper  bound.  Lemma  2 
follows,  and  we  conclude  that  the  equidistant  PPM  signal  set  (25)  maximizes 
Rq  ^ when  the  average  energy  constraint  predominates. 

Lemma  2 can  be  strengthened  by  noting  that  the  equidistant  PPM  signal 
set  (25)  is  the  unique  signal  set  that  achieves  equality  in  (23)  modulo 
shifting  or  splitting  of  pulses  while  keeping  them  nonoverlapping  and  keeping 
the  total  "on-time"  of  any  g^  equal  to  eT/q.  This  is  because  condition 
(a)  of  Lemma  1 is  satisfied  if  and  only  if  pulses  are  nonoverlapping  and 
because  e^l  is  chosen  precisely  to  use  up  all  the  available  energy,  as 
required  by  condition  (b)  of  Lemma  1. 


Peak-Amplitude  Constraint  Predominates.  By  this  we  mean  that  the  energy 
constraint  (15)  is  not  a limiting  consideration.  We  therefore  neglect  it. 


as  well  as  the  equidistance  constraint,  and  consider  the  problem  of  maximizing 
2 

d in  (18)  subject  only  to  (17).  The  average  energy  required  by  the  signals 
that  solve  this  problem  will  then  provide  conditions  for  dominance  of  the 
peak-amplitude  constraint,  and  among  the  solutions  to  the  relaxed  problem 
are  ones  satisfying  the  equidistance  constraint,  and  these  are  then  solutions 
to  the  fully  constrained  problem. 

2 

In  the  Appendix,  we  derive  the  following  upper  bounds  on  d : 


2 

d < 


*«q(q“l)'^<gmax“gmln)^T,  q even 


*5  (q+l)q’^(g^^-g^i^)^T,  q odd. 


(27a) 

(27b) 


An  alternative  and  simpler  derivation  for  q even  is  as  follows.  For  any 

choice  of  g , there  holds 
s 


so  that  summing  over  i and  j yields 


X)  2 (g^-g.)^  " 2q  " 2q^(c-g  )' 

i-1  j«l  ^ J i-1  ^ ® 


where  c is  the  centroid  c = q ^ g . . Thus,  from  (18) 

i-1  ^ 


I q 


d - 2(q 


1-1)'^  f 2 [g^Ct)- 


0 i=l 


/ 


g_(t)]^dt  - 2q(q-l)'^  / (c(t)-g  (t)J^dt 


T q 


<2(q-l)"^  / 2^  [g,(t)-g  (t)]^dt. 


'0  i-l 


(28) 


with  equality  holding  if  and  only  if  c(t)  = for  almost  all  tC[0>T]. 

Taking  g (t)  - implies  |g.  (t)-g  |<  >5(g  -g  ),  and  the  bound  in 


max  min 


s — max  min 


(27a)  then  follows  from  (28).  This  bound  holds  for  both  odd  and  even  values  of 
q,  but  it  is  tight  only  for  q even,  and  the  more  precise  bound  (27b)  derived  in 
the  Appendix  for  q odd  is  the  one  that  is  achieved  with  equality. 

Any  set  of  q equidistant  signals  G satisfying  (17)  and  achieving  the  upper 
bound  (27a)  for  q even  or  (27b)  for  q odd  is  a signal  set  maximizing  R. 

0,OB 

Signal  sets  having  these  properties  can  be  identified  for  certain  values  of  q 
by  the  folloxiring  procedure.  Partition  [0,T]  into  m equal  sub  intervals,  and 
define  m functions  Pj^(t),  i=l,2,...,m,  that  are  piecewise  constant  having  a 

constant  value  of  1 or  0 over  each  subinterval.  Then,  p (t)  can  be  identified 

• t 

by  a binary  codeword  of  length  m bits.  If  we  write  g*(t)  = g . +p.  (t)[g  -g  ^ , 

1 min  1 max  min 

it  is  enough  to  find  q binary  codewords  of  length  m whose  common  Hamming 
distance  satisfies  the  conditions  in  Table  1.  The  last  column  in  this  table 
reflects  a necessary  condition  for  optimality  that  follows  immediately  from 
conditions  for  equality  in  (Al)  that  yields  the  upper  bound  (27);  namely,  for  all 
tC[0,T], 

(i)  for  q even,  q/2  of  the  signals  take  on  value  and  the  remaining 

q/2  value  g , , 
min 

(ii)  for  q odd,  (q-l)/2  or  (q+2)/2  of  the  signals  take  on  value  g^^^  and. 

the  remainder  g . • 
min 

This  provides  an  additional  check  on  the  otimality  of  the  following  signal 
set  and  was  an  Important  aspect  in  our  identification  of  it.  For  optimality, 
however,  it  is  sufficient  that  the  signal  set  be  equidistant  and  achieve  the 
appropriate  upper  bound  (i.e. , Hamming  distance). 

For  q a multiple  of  4 and  such  that  a Hadamard  matrix  of  order  q exists, 
q codewords  satisfying  these  conditions  are  easily  obtained  by  deleting  the 
first  column  (all  ones)  of  the  normalized  Hadamard  matrix  [10,11].  From 


this,  s = q-1  codewords  satisfying  row  3 of  Table  1 with  s replacing  q can 


be  obtained  by  deleting  the  codeword  of  all  ones.  Also,  p =*sq  codewords 
satisfying  row  2 of  Table  1 with  p replacing  q can  be  obtained  by  deleting 
all  rows  of  the  normalized  Hadamard  matrix  that  have  a 0 in  (say)  the  second 
column  and  the  deleting  the  first  two  columns.  From  this,  s =•  *5q-l  codewords 
satisfying  row  4 of  Table  1 with  s replacing  q can  be  obtained  by  deleting  the 
codeword  of  all  ones.  Since  Hadamard  matrices  for  q=l,2,  or  a multiple  of 
4 are  known  up  to  q-200  except  for  q=188,  this  procedure  gives  an  optimizing 
signal  set  6 for  all  q£200  except  for  q =■  93,  94,  187,  and  188.  Also  infinite 

k 

families  of  Hadamard  matrices  are  known,  for  example  those  for  which  q»2 
for  some  positive  Integer  k:  these  coincide  with  cyclic  maximal- length 
shift  register  codes,  and  they  are  also  a subset  of  the  first-order  Reed- 


Muller  codewords  of  this  length. 

We  remark  that  complementation  of  an  optimum  signal  set  yields  another 
optimum  signal  set.  Also,  time  sharing  of  any  two  optimum  signal  sets  yields 
another  optimum  signal  set. 

The  average  energy  of  these  signal  sets  is  easily  calculated  as  follows. 
For  q even,  because,  at  any  time,  q/2  signals  have  value  g and  the  remainder 


®min’ 


which  implies 


— — 2 2 2 
s = E “ g T = J<(g  — g , )T. 

^'■^max  ®min 


g °min 


Also,  for  q odd,  at  any  time,  (q+l)/2  signals  have  value  g^^^  and  the  remainder 


g , so 
^max 


which  implies 


s “ E_-g  , T 
8 ®min 


-9^  (g^  -g^  )T. 

2q  '•^max  ^min'^ 


This  uses  less  energy  than  taking  (q  + l)/2  signals  with  value 


B and,  thus,  extends  the  range  of  average  energies  for  which  this  choice 
max 

is  optimum;  namely,  the  available  average  energy  must  exceed  that  required 
for  s of  (30)  or  (32),  which  yields  condition  (19c). 

Finally,  the  distance  d*  achieved  by  these  signals  that  maximize  R 

U," 

when  the  peak-amplitude  constraint  predominates  is  given  by 


g -g  ■ 

max  min 

B +B  . 

max  min- 


q even 


q+1  r^max  ^minl  — 

q-1  l_g  +g  . J ’ 

^ L max  min-1 


q odd 


Neither  Constraint  Predominates.  If  s satisfies  (19a)  or  (19c),  the  PPM 
signal  set  or,  respectively,  the  Hadamard-derived  signal  set  maximizes  R 

0,“* 

Unless  q=2  or  q=3,  we  are  left  with  a range  of  values  of  s for  which  a 
solution  has  yet  to  be  identified.  This  "gap"  region  is  specified  in  (19b). 

For  q=’2  or  q=3,  this  region  collapses  to  the  empty  set,  and  at  the  common  upper 
limit  of  the  range  (19a)  and  lower  limit  of  range  (19c),  the  PPM  or  Hadamard- 
derived  signal  sets  are  equivalent  and  optimum.  For  q^4,  we  now  demonstrate 
that  an  optimum  signal  set  results  by  time  sharing  the  PPM  and  Hadamard-derived 


solutions . 

The  gap  region  has  strictly  positive  length  if  q^4,  and  then  any  point 

in  either  interval  (19b)  can  be  expressed  as  a strictly  convex  combination  of 

the  endpoints;  that  is,  for  q even  and  s in  the  appropriate  interval  (19b), 

max 

there  exists  a unique  XC(0,1)  such  that 


fX  . (1-X),  .2  2 

q 2 ^ ^®max  ®min^^’ 


(34a) 


while  for  q odd  and  s in  the  appropriate  interval  (19b),  there  exists  a 
unique  XC(0,1)  such  that 


(34b) 


I = rA  + (A-XKarl}.i  (2  _ 2 

max  <1  2q  ®max  ^min' 


An  optimum  choice  of  modulation  can  now  be  given  in  terms  of  X. 


L&vma  3.  For  q even  (respectively,  odd)  and  s in  the  appropriate  interval 

max 

specified  in  (19b),  let  X be  defined  by  (34a)  (respectively, (34b)) . Then  an 

equidistant  signal  set  that  maximizes  R-  while  satisfying  the  average  energy 

0,“ 

and  peak-amplitude  constraints  with  equality  is;  for  fraction  X of  the 
Interval  [0,T],  use  the  "full-width"  PPM  signal  set  (25)  with  e = 1 and  T 
replaced  by  XT,  and  for  fraction  (1-X)  of  [0,T],  use  the  signal  set  defined 
by  the  appropriate  Hadamard  matrix,  as  discussed  in  the  previous  section  with 
T replaced  by  (l-X)T. 

Lemma  3 is  proven  as  follows.  For  an  arbitrary  choice  of  aC[0,l]  and 

an  arbitrary  choice  of  maximum  average  energy  s,  C[0,s  ] allocated  to 

l,max  ' max 

the  Interval  [0,  oT],  we  have  from  Lemma  1 


1 

q(q-l) 


0 i=l  j=l 


[g^(t)-gj (t)  dt  £ 2 


and  from  (27) 


(35) 


J 

oT 


T q q 

i=l  j=l 


(l-a)Tq  , ^2 

(g  “g  . ) , q even 
2(q-l)  max  min  ^ 


(36a) 


(l-a)T(Q+l),  .2 

— oir  (S  -g  . ) . q odd 

2q  max  min 


(36b) 


Adding  these  expressions  and  using  (18),  we  obtain 


L®maxXinJ  2(q-l)  max"  min  ' 


q even 


d}  < 


2 Pma_x + Ikallla+ll  (g  _g  )2 
ISmaxXinJ  ^q  max  min 


Now,  from  Lemma  1,  equality  holds  in  (35)  if  and  only  if  both: 


(i)  at  any  time  t C [0,aT],  all  signals  take  on  value  g . , except  at 

min 

most  one,  which  takes  on  value  g^^^.  This  implies  that  the  average 
energy  s^^  used  on  [0,aT]  is 


1 r 2,  . 2 „ uT  . 2 2 - 

5 = _ I g (t)dt  - g . ax  < — (g  -g  . ) 

1 q / f J 1 min  — q max  min 


i=l  0 


(ii)  s,  = s, 
'1  l,max 


Thus,  a necessary  condition  for  equality  in  (35)  is 


— 2 2 V 

s,  < — (g  -g  . ). 

l,max—  q max  min 


Furthermore,  the  derivation  in  the  Appendix  shows  that  equality  holds  in 

(36a)  only  if  half  of  the  signals  take  on  value  g . and  the  remainder  g 

min  max 

This  involves  an  average  energy  usage  S2  on  [aT,T]  of 


iil‘i 

i=l  aT 


(t)dt  - g^.  (l-a)T  =>  ^®Lx”®min^' 


Because  of  the  total  average  energy  constraint,  s.  + S-<  s , we  then  have  using 


(40) 


— ^ — — A . 2 2 . 

W=2  " <-5 


For  equality  to  hold  in  (37a),  it  is  necessary  that  both  (35)  and  (36a)  hold 
with  equality,  and  necessary  conditions  for  these  are  in  turn  (38)  and, 
combining  s 


s,  with  (40) , 
l,max 


— ^ A . g-Xx  ,2  2 . 

®l,max  — q 2 ®max  ®min^  ’ 


q even. 


(41a) 


For  q odd,  the  corresponding  necessary  conditions  for  equality  in  (37b) 
become  (38)  and 


4 4- 

l,max  — I q 


(g-X) (q-1) 


2q 


] 


q odd. 


(41b) 


We  now  consider  the  selection  of  s,  and  g to  maximize  the  upper  bound 

l,max 


(37a)  subject  to  the  constraints  (38)  and  (41a),  which  are  necessary  conditions 
for  it  to  hold  with  equality.  Because  both  (38)  and  (41a)  are  constraints 


on  s * we  consider  each  in  turn  to  be  dominant  in  the  sense  of  being  more 
l,max 


restrictive.  The  bound  (38)  is  less  than  or  equal  to  the  bound  (41a)  if 
and  only  if  g>^  X.  Substituting  (38)  into  (37a)  and  simplifying,  we  obtain 


o (g  “8  j ) T - _ 


2q(q-l) 


Because  we  are  considering  q^4,  this  bound  is  maximized  over  gC[X,l]  by 
the  unique  choice  g = X.  Similarly,  the  bound  (41a)  is  less  than  or  equal 
to  the  bound  (38) if  and  only  if  a^X.  Substituting  (41a)  into  (37a)  and 
simplifying,  we  obtain 


^ — ^®max  ®min 


2 

ri  L2(q- 


-2) 


1) 


_g . X(2-q) 

2 (q-1)  q 


1 


Again  because,  q^4,  this  bound  is  maximized  over  aC[0,X]  by  the  unique 
choice  a^X.  Thus,  the  bound  (37a)  is  maximized,  subject  to  the  necessary 
conditions  (38)  and  (Ala)  for  it  to  be  achieved,  by  taking  a =X,  and  the 
corresponding  maximum  bound  is 


2 , ,2 

-[q  2(q-l)J 


q even. 


But  this  upper  bound  is  readily  achieved  by  the  time-sharing  of  a "full-width" 
PPM  signal  set  for  fraction  X of  [0,T]  and  the  Hadamard  derived  signal  set  for 
the  remaining  fraction  1-X  of  [0,T].  Furthermore,  the  average  energy  required 
by  this  solution  is  exactly  s^^^.  For  q odd,  a similar  analysis  leads  again 
to  the  unique  choice  a=  X and  the  corresponding  maximum  bound 


n ^ a-W  (gfi),  , ^2 

q 2q  ^®max  ^min''  ’ 


q odd. 


Again,  this  oound  is  achieved  by  the  time  sharing  of  a full-width  PPM  signal 

set  for  fraction  X of  [0,T.]  and  the  appropriate  Hadamard-matrix  derived  signal 

set  for  the  remaining  fraction  1-X  of  [0,T],  and  the  average  energy  required 

by  this  solution  is  exactly  s , as  before. 

max 


IV.  Efficient  Energy  Utilization 

Denote  by  X^  the  count  rate  due  to  the  signal  alone  when  it  is  "on" 

for  any  of  the  optimal  signal  sets  derived  in  the  previous  section.  Then, 

2 2 

g = X +X.^  and  g . = X„,  where  X-  is  the  count  rate  due  to  the  noise  alone, 
max  s 0 mm  0 0 

In  considering  designs  for  efficient  energy  utilization,  we  distinguish  three 

situations  depending  on  which  of  X , s , and  e are  adjustable  and  which 

s max 

are  fixed.  We  seek  to  identify  values  of  the  adjustable  parameters  so  that 


the  cutoff  rate  per  unit  energy,  R /s,  is  greatest. 

0,* 

— 2 
1.  s adjustable,  X fixed.  The  value  of  d achieved  with  the  optimal 
max  ® 

signal  sets  of  the  previous  section  is  shown  in  Fig.  2 as  a function  of  s 

max 

2 

assuming  that  is  a fixed  constant.  Here,  d is  a piecewise  linear  function 
of  s^^^  with  the  following  parameters: 

il  - (44) 


tl-(l+A^/X„)'®)^„T.  , odd 


S-  = 2(X  /X  )[1-(1+X 

1 Os  s 0 


■^^(Xo/Xs)[l“(l+Xg/Xo)*^]^,  q even 


(X./X  )[1-(1+X„/X„)*^]2,  q odd. 


q2-4q+3  0 ® ® 0 


Using  (10),  with  equality  for  optimal  signal  sets,  and  using  the  expressions 
2 

for  d implied  by  Fig.  2 and  (44)-(46),  we  conclude  that 


dR^  /ds  = 0.72(q-l)[(q-l)+  expCJ^d^)]  ^ (slope), 
0,"  max 


where  the  factor  "slope"  is  s^,  s^,  or  zero  depending  on  which  constraint,  if 

any, .predominates.  Thus,  dR.  /ds  decreases  monotonically  with  increasing 

0,"  max 

s , so  the  signal  energy  is  used  most  efficiently  when  s is  small,  where 
max  o oj  .X 

the  energy  constraint  predominates  and  where  the  PPM  signal  set  is  optimal 


and  utilizes  energy  s = s . This  situation  is  analogous  to  that  studied 

max 

by  Massey  [8,9]  for  the  additive  Gaussian-noise  channel.  For  s = s 

maix 

small,  we  conclude  that 


with  equality  achieved  for  s ®0.  Hence, 


is  an  upper  bound  on  R for  any  choice  of  s and  any  choice  of  modulation  with 

0," 

near  equality  holding  when  s is  small  and  for  the  PPM  signal  set.  Since 

for  the  PPM  signal  set,  s = eTl  /q,  this  means  that  when  X is  fixed,  the 

s s 

most  efficient  energy  utilization  occurs  for  narrow  pulses,  e being  selected 
as  small  as  practically  feasible. 

2.  X adjustable,  s fixed.  By  a somewhat  messy  but  straightforward  calcu- 

s 

lation,  it  is  readily  verified  that  dR.  /dX  > 0 for  s fixed.  Thus,  R„  , 

0,"  s — 0,“’ 

and  hence  R^  ^/s  for  s fixed,  is  a nondecreasing  function  of  X^.  Consequently, 

the  most  efficient  energy  utilization  is  achieved  by  selecting  X large  and, 

s 

therefore,  operating  in  the  region  where  the  energy  constraint  predominates. 

This  Implies  using  the  PPM  signal  set  with  as  large  a value  of  signal  count- 

rate  X as  practical  and  sufficiently  narrow  pulses  that  s = eTX  /q. 
s s 

3.  s adjustable,  PPM  signal  set  with  e fixed.  The  PPM  signal  set  with  a 

fixed  pulse  width  eT/q  maximizes  R provided  the  energy  constraint  pre- 

U,“ 

dominates,  which  we  assume.  For  e fixed  and  s = eTX  /q,  we  find  that 

s 


Rq^^Cs.u^j^)  = log2q  - log2{l+(q-l)exp[-  ^ n^ ( 1-  / 1+qa f ) ^ ] > . (51) 


where  we  define 


“e££  ■ ""  ■ 


as  the  "effective"  average  number  of  noise  counts  per  channel  use,  and 
where 


a re  “ s/n 
eff  eff 


is  the  signal-to- noise  energy  ratio.  Graphs  of  Rq  oo^®»“eff^  ^ function  of 

for  several  values  of  sre  given  in  Fig.  3.  These  graphs  are  seen  to 

increase  monotonically  with  ctg££  for  each  fixed  value  of  Thus,  as 

expected,  the  performance  improves  systematically  for  fixed  o^££  the  average 

sigTial  energy  per  channel  use,  s,  increases.  However,  while  starting  from 

s = 0,  the  performance  initially  improves  rapidly,  there  is  a point  of 

diminishing  returns  after  which  there  is  only  marginal  improvement  for  further 

increases  in  s.  For  each  n , there  is  an  s = s*(n  ) such  that  for  all 

err  eff 

s > 0 there  holds 


^0.“»^°**^eff^  < *°eff^ 


This  value  of  s can  be  found  graphically  for  each  ii^££  by  pivotting  a vertical 

linje  about  the  origin  (R^  0,  Og££  = 0)  in  Fig.  3 until  it  lies  tangent 

to  ithe  graph  of  R„  (s,n  ,,).  The  abscissa  of  the  point  of  tangency  is  s*/n 

U,”  ett  err 

Inequality  (53)  holds  because  the  graph  of  R^  oo^®»”eff^  lies  on  or  below  the 
line  so  constructed  for  all  ft  follows  from  (53)  that  the  most  effi- 

cient utilization  of  energy,  in  the  sense  that  the  cutoff  rate  per  unit  energy 
is  greatest,  is  achieved  when  s = s*.  The  dashed  line  in  Fig.  3 is  a fit 


of  s*/n  versus  n obtained  graphically  by  connecting  together  the  points 


of  tangency  described  above.  From  this  fit,  we  find  for  the  range  of  average 


noise  counts  in  the  figure  that  s*  and  sre  approximately  related  by 


the  following  power- law; 


s*  = 2.349 

eff 


This  is  shown  as  the  solid  line  in  Fig.  4.  A measure  of  the  range  of  energies 


nearly  as  efficient  as  s*  can  be  determined  for  each  from  the  values 


of  = ®^%ff  ^ ^ which  Rq  oo(s,n^^j)  Is  close  to,  say  within 


10%  of,  the  ordinate  of  the  line  of  tangency  constructed  as  above.  Values 


of  s within  the  dashed  lines  In  Fig.  4 satisfy  this  10%  condition;  Fig.  4 


implies  that  for  maximally  efficient  energy  utilization  s should  be  kept  within 


about  + 2db  of  s*. 


V.  Effect  of  Finite  Output  Quantization 


The  cutoff  rate  decreases  from  ^ as  the  dimension,  q",  of  the  output 


alphabet  decreases.  This  degradation  is  greatest  for  a binary  input  alphabet 


(q=2)  when  a'  ■ 2,  which  corresponds  to  making  bit  by  bit  decisions  without 


any  coding.  For  a Gaussian  model,  Massey  [8,9]  concludes  that  choosing  q"  =2 


results  in  a quantization  loss  of  2.04db-;  that  is,  in  the  efficient  range  of 


energy  utilization  for  the  Gaussian  model,  the  energy  per  channel  use  must  be 


about  2db  greater  for  q"  =2  in  order  to  achieve  the  same  cutoff  rate  as  when 


q'  =o».  Moreover,  Massey  also  concludes  that  for  q"  = 8,  there  is  virtually 


no  quantization  loss.  The  degradation  for  the  Poisson  model  is  somewhat 


smaller  than  that  found  by  Massey  when  about  the  same 


order  when  n « 1. 

eff 


Suppose  the  input  and  output  alphabets  are  X = {0,1}  and  V = {0,1}, 


so  that  q^q'  “2.  We  adopt  a binary  pulse-position  modulation  format  with 


pulses  of  duration  eT/2  because  this  maximizes  R_  when  the  energy  constraint 
predominates.  For  this  choice,  each  symbol  interval  is  divided  into  two 
equal  sub intervals,  and  output  letters  are  generated  according  to:  "produce 
1 if  n[0, eT/2]  < n[T/2, (l+e)T/2] , otherwise  produce  0,"  where  n[0,eT/2]  and 
n[T/2,  (l+e)T/2]  are  the  number  of  points  observed  in  the  first  and  second 
signalling  interval,  respectively.  Here,  ntO,eT/2]  and  n[T/ 2,  (l+e)T/2l 
are  independent  Poisson  random  variables  with  mean  parameters  s 
and  » respectively,  when  0 is  the  input  letter  and  “g££/2  and 

s + (“gff/2),  respectively,  when  1 Ls  the  input  letter.  As  in  the  previous 
discussion,  s and  ®re  the  average  number  of  signal  counts  received  per 

channel  use  and  effective  number  of  noise  counts  received  per  chsumel  use. 

It  is  straightforward  to  conclude  for  these  assumptions  that  the  cutoff  rate 


is  given  by 


^0,q'-2  '“  ^ " log2{l+2[p(l-p)]’"}  . 


where  p is  the  binary  error  probability  associated  with  producing  an  output 

symbol  1 (or  a 0)  when  the  input  symbol  is  a 0 (or  a 1,  respectively).  This 

error  probability  is  given  graphically  for  certain  values  of  n and  a range 

eft 

of  s by  Pratt  [4,  p.'209:  identify  n “ 2p  and  s=p  ]. 

fly  Jd  SfO 

The  values  tabulated  in  Table  2 were  obtained  as  follows:  (l),s*  is 
obtained  from  (54)  for  each  (3),  p*  is  the  value  of  p in  (55)  such 

that  0 < p*  < 1 and  R ^ = R*  ; and  (4) , s is  obtained  by  interpolation 

from  Che  graph  given  by  Pratt.  Thus,  s*  and  s of  the  table  yield  the  same 
cutoff  rate  for  q^  =<»and  q'  =2,  respectively.  To  within  the  accuracy  that 
the  interpolation  step  can  be  accomplished,  we  conclude  that  about  1.5  db 
more  signal  energy  is  required  with  hard  decisions  than  with  infinitely  soft 
decisions  for  the  range  of  5 to  40  counts  per  channel  use. 


VI.  Effect  of  Input  Alphabet  Dimension 


For  an  input  alphabet  of  dimension  q,  q"*  = ",  and  an  average  energy 
constraint  that  predominates,  q-ary  pulse-position  modulation  maximizes 
the  cutoff  rate.  We  now  consider  the  effect  of  q in  each  of  the  three 
situations  identified  in  Section  IV. 


1.  s adjustable,  X fixed.  From  (48)  and  (49),  increasing  q from  2 to 

DdX  S 

" implies  that  the  greatest  rate  per  unit  energy  that  can  be  achieved 


Increases  by  a factor  of  2.  Moreover,  examination  of  graphs  of  R„  /s  for 

0," 

2 

Rq  ^ given  by  (10)  with  equality  and  with  d = s^^s,  where  Sj^  is  given  in 
(46),  shovJs  that  the  range  of  values  of  s for  which  the  approximation  (49)  holds 


closely  increases  as  q Increases;  in  other  words,  the  range  of  efficient 


signal  energies  is  extended  as  q is  increased. 

2.  X^  adjustable,  s fixed.  For  X^  large  and  the  PPM  signal  set,  we  see  from 
Fig.  2 and  (46)  that  d^— 2s.  Then, 


RQ^„~log2q  - log2[l+(q-l)e“®]  £-3^  s.  (56) 

Hence,  for  large  X , R _/s’<(q-l)/q  and,  therefore,  the  largest  rate  per 

s u," 

unit  energy  increases  by  no  more  than  a factor  of  2 as  q Increases  from  2 to 

3.  s adjustable,  PPM  signal  set  with  e fixed.  A graph  of  (51)  as  a function 
’^eff  ” various  values  of  q is  shown  in  Fig.  5.  For  each  q, 

there  is  a corresponding  signal  energy  that  is  most  efficient;  this  can  be 
found  graphically  in  the  same  manner  as  before,  as  indicated  by  the  lines  of 
tangency.  These . efficient  energies  depend  upon  q;  very  roughly  from  the 
graphs,  we  find  that  (s*/16)q~l,  so  that  the  most  efficient  signal  energy 
decreases  as  q Increases.  This  Implies  a significant  potential  improvement 


in  performance  at  low  signal  energies  by  the  use  of  a large  Input-alphabet 

dimension  q and  q-ary  pulse-position  modulation.  These  observations  appear 

to  hold  for  other  values  of  n as  well. 

eit 


VII.  Effect  of  Random  Detector  Cain 


Let  {M(t);  t^O}  be  a compound  Poisson  counting  process  defined  by 
N(t) 

= E “n*  (57) 

n=0 


where  {N(t);  t^O}  is  the  Poisson  counting  process  defined  above,  Uq=0,  and 
{u^;  n=l,2,...}  is  a sequence  of  independent,  identically  distributed  random 
variables  each  having  an  integer  value  greater  than  zero.  Here,  {N(t);  t2,0} 
models  primary  photoelectron  conversions,  and  u^  models  the  number  of  secondary 
electrons  appearing  at  the  detector  output  due  to  the  nth  conversion.  This 
random  gain  is  an  important  effect  encountered,  for  example,  with  avalanche 
detectors  used  in  optical-fiber  communication  systems. 

In  considering  a digital-data  communication  system  in  which  measurements 
are  derived  from  {M(t);  t>;^0},  it  is  of  interest  to  know  the  cutoff  rate  R ^ 
for  infinitely  fine  quantization.  As  before,  this  quantity  then  places  an 
upper  limit  on  the  performance  for  any  receiver  employing  finite  quantization, 
such  as  an  "integrate  and  dump"  receiver  [12,13]  in  which  M(nT)  - M[(n-1)T] 
is  used  to  make  a decision  about  the  nth  transmitted  symbol. 

We  find  R_  to  be  identical  to  that  in  (6) , so  random  detector-gain 

u,  “ 

neither  degrades  nor  inhances  the  cutoff  rate  for  infinitely  fine  output 
quantization.  This  is  because  the  distribution  of  the  random  gains  is 
unaffected  by  the  choice  of  transmitted  signal  on  our  model  and  can  be  verified 
mathematically  by  the  following  steps.  First,  we  write  the  summation  over  k 


(58) 


in  (3)  as  where 


is  the  likelihood  ratio  for  symbol  X.  relative  to  symbol  X,  and  £.(•)  denotes 

i j J 

a conditional  expectation  given  X^.  As  the  output  quantization  is  refined, 
this  becomes 

f(i,j)  = EJaJ  /M(t);  0<t<T)],  (59] 

J J 

where  A.  ,(M(c);  O^tj^T)  is  given  by  the  ratio  of  the  sample  function  densities 
[18]  of  {M(t);  OKt^T}  for  symbols  X^  and  X^ . This  likelihood  ratio  is  found 
not  to  be  a function  of  the  random  gains,  and  the  assertion  follows. 

A consequence  of  this  assertion  is  that  many  of  the  conclusions  reached 
in  preceeding  sections  also  apply  in  the  presence  of  random  detector-gain. 

At  the  present  time,  there  are  too  few  published  results  on  the  binary  error- 
probability  for  an  Integra te-and-dump  receiver  for  us  to  examine  the  potential 
benefits  of  employing  finer  output  quantization,  but  this  is  a matter  of 
some  practical  interest  for  fiber-optic  systems. 


VIII.  Polarization  Modulation 


Suppose  that  binary,  orthogonal  polarization  modulation  can  also  be  employed 

in  the  optical  modulator  of  Fig.  1 in  addition  to  temporal  modulation.  Then  the 

scalar  field  E(t,r)  becomes  a vector  (Ej^(t,r),  E2(t,r))  in  which  one  component 

is  the  0°  field  and  the  other  one  the  90°  field.  A polarization  decomposition 

of  the  received  field  followed  by  direct  detection  in  each  channel  then  results 

in  two  independent  point  processes,  which  we  label  N^(t)  and  N2(t),  O^t^T. 

Assume  that  when  the  input  codeletter  is  X.  C {X-,X  , ...,X  } that  the  count 

1 1 z q 

rate  for  N^(t)  is 


-29- 


(60a) 


and  for  N2(t)  is 


X2i(t) 


8,.(t)  + X-  » g,,(t). 


(60b) 


Following  the  procedure  used  in  the  last  section,  as  q'-v*  , the  sum  over  k 
in  (3),  call  it  f(i,j),  becomes 


f(i.j)  - Ej  (AJ  j(N^(t),  N2(t);  0<  t<T] 


- exp(-Jsd^j), 


where 


T 

/ ([8li(t)  - + lg2i(t)  - 82j(t)]^)dt.  (62) 


The  steps  leading  to  (10)  remain  unchanged  with  (62)  replacing  (8). 

We  now  assume  that  each  of  the  signals  in  » {Ej^j^(t,r),  » ~*»E  j^^(t,r)/ 

and  S2  **  {E2]^^*'»^^»  ®22^^*^^**“’  satsify  the  average  energy  and  peak- 

amplitude  constraints  in  the  section  about  modulator  design  based  on  R.  . Then, 
we  have  the  following  optimization  problem;  select  signals  in 

822^*-^ » * * • and  ^2  “{82]^(*-)»822(*-^»  * • * **12  to  maximize 


q q 


- [q(q-l)] 


^ ^ y ^ + tg22(t)-g2Xt)l^)dt  (6: 


i®l  j=l  0 


*2i'  ' ^2j 


subject  to  the  following  constraints: 


N 


-30- 


I 

I 

I 

(i)  the  equidiatanae  constraint:  the  quantities  in  (62)  should  be  the 
same  whenever  i^^j. 

' (ii)  average  energy  constraint:  (15)  should  be  satisfied  for  both  signal 

sets  and  62* 

(ill)  peak-amplitude  constraint:  (17)  should  be  satisfied  for  both 
signal  sets  G^^  and  62^ 

By  paralleling  the  development  leading  to  Lemma  1,  we  have  the  following. 


Lernna  1:  Given  as  the  maximum  average  signal  counts  per  channel  use  in 

each  polarization  component  and  given  (17)  for  both  signal  sets  and  G^,  then 


Furthermore,  equality  holds  if  and  only  if  both:  (a),  at  any  time  tC[0,T], 

all  signals  in  G,  and  G„  take  on  the  value  g except  at  most  one  in  G,  and 
1 i.  min  1 

one  in  G_  which  takes  on  value  g ; and  (b) , 
z max 

q 

i S 

i=l 

both  both  k=l  and  lo»2. 

A signal  set  G = equidistant,  in  the  sense  that  the  quantities 

in  (62)  are  the  same  whenever  ii*j , and  that  achieves  the  upper  bound  in  Lemma  1' 
with  equality,  and  which  therefore  maximizes  ^ when  the  average  energy  con- 
straint predominates  in  each  polarization  component,  is  characterized  in  the 
following  lemma  for  q even. 


Lema  21  if  q is  even  and  s^^  satisfies  (19a),  equality  is  achieved  in  (64) 
by  the  following  signal  set:  for  lj<  i£(q/2)  and  j = i + (q/2). 
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gj^(t)  = g2j(t) 


®max  * 


[®min.  * 


(i-l)2T/qj<  t < (l-l+e)2T/q 


otherwise  for  0 < t < T 


g£j(t)  = g*^(t)  = . 0<  t<T. 


where  e Is  given  in  (26). 

The  verification  of  Lemma  2'  is  straightforward  paralleling  the  veri- 
fication of  Lemma  2.  It  is  Interesting  to  note  .for  q=4  that  this  signal  set, 
then  called  "quatranary  pulse  modulation,"  is  used  in  the  one  gigabit  per 
second  optical  communication  system  reported  by  M.  Ross,  et  al.  [15]. 
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TABLE  2.  Degradation  Due  to  Finite  Quantization 


“eff 

s* 

* 

^0.“ 

P* 

s 

101og(s/s*) 

1 

2.35 

0.53 

0.038 

3.8 

2.09 

5 

4.86 

0.65 

0.020 

7.0 

1.58 

10 

6.65 

0.68 

0.016 

9.25 

1.43 

20 

9.10 

0.70 

0.014 

12.7 

1.45 

40 

12.45 

0.71 

0.013 

16.9 

1.33 
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APPENDIX  (derivation  of  (27)) 


Let  Q_  = {1,2,. ..,q}  and  define  J*  by 


J* 


max 


gj^(-),kCO 


7 i:  ts, 

•'o  i.jGO 


,.(t)-gj(t)]^dt. 


Then 


max  max 


J*  < T 


tC[0,T]  gj^(t),kGa  ’ 


(Al) 


(A2) 


with  equality  if  and  only  if  the  integrand  in  (Al)  is  a constant  independent 
of  t.  Thus,  we  consider  the  problem  of  choosing  q real  numbers  g^^,  kC(i  to 
maximize 


Kg)  = ^ ^ (g^-gj)^ 

i.jea 


(A3) 


subject  to  g . <g.  <g  , iGQ..  A necessary  condition  for 
“min  — “i—  max  ^ 

-I (and  so  maximize  I)  is  the  existence  of  2q  real  numbers  v 


iGQ.  to  minimize 


i=l,2,...,q,  such  that  [16]' 


I'(g*,vt,v)  £L(g,M,  V ),  for  all  g in  r'* 

(A4) 

8?  implies  h = 0, 

1 max  1 

(A5) 

8?  implies  v = 0, 

1 mxn  1 

(A6) 

where  the  Lagrangian  L is  defined  by 


- z 


J AC 


(g^-gJ^  +Su  (g  -g 

n i J Acn  1 1 ' 


max 


(A7) 


L(g,M,v) 
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Equatlng  to  zero  the  derivative  of  L with  respect  to  g^,  we  obtain  for  iCQ. 


-2q  g*  + 2q  c*  + = 0, 


(A8) 


where  c*  = q 


Es,. 


iC(i 


From  (A5)  and  (A6) , if  g . < g?  <g  > then 

i&XTi  x m^x 


= 0,  and 


g*  = c*. 


(A9) 


Thus,  each  g*  takes  on  one  of  three  values:  g . ,g  , or  c*.  Let  there  be 
1 min  °max 

n . , n , and  q - n , - n of  these,  respectively.  Then,  from  the 

min  max  min  max  •' 

definition  of  c*.  we  have 

qc*=n.g.+n  g + (q-n  ^ -n  )c* 
min  min  max  max 


min  max 


or 


(AlO) 


c*=(n,g.  +n  g )/(n.  +n  ). 
min  min  max  max  mm  max 


Furthermore, 


(gt 

i.jca  ^ 


g*)^  = 2n  . n (g  -g  . )^ 
j min  max  '^max  min 


+ 2n  (q-n  , -n  ) (g  -c*)' 
max  min  max  max 


(All) 


+ 2n  . (q-n  . -n  ) (c*  - g , 
min  min  max  min 


Substituting  (AlO)  into  (AIJ ) and  simplifying,  we  obtain 


Kg*)  = L(g*,p,v)  = 2q(g^^^-g^^) 


Y-^— ) 

In  . n / 
\ min  max  / 


-1 


(A12) 


and  this  is  to  be  minimized  subject  to  0 < n . + n <q,  n ^ and  n being 

•'  — min  max  ^ min  max  ® 

non-negative  integers,  which  is  the  same  as  the  minimization  of  (1/n  , )+(l/n  ) 

inln  1D3X 

with  the  same  constraints.  The  solution  to  this  is:  for  q even,  n . “n  ■q/2: 

^ ’ min  max 
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and  for  q odd,  either  “ (q+l)/2  and  = (q-l)/2  or  = (q-l)/2 

2 

and  n = (q+l)/2.  Thus,  for  q even,  we  have 
max 


1(8*)  - >5  <l^S^-8„j„)' 


(A13) 


and,  for  q odd. 


Kf)  = ii(q+i)(q-i)(gmax"W^- 


(A14) 


^2  "“1 

The  corresponding  upper  bounds  on  d = q (q-l)J*  are,  from  (Al) , 


*2 

d < 


^q(q-l)(g^^-gmin>  T.  q even 


(A15) 
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FOOTNOTES 


This  is  a necessary  condition  for  a regular  point  g*  to  minimize  -I  subject 

to  g .„<  g,  < g , iCO.  Since  g.  - g <0  and  g . - g.<  0 cannot  be 
min  — “i—  max  i max—  min  i— 

simultaneously  active  (that  is,  satisfied  with  equality),  it  is  evident  that 
the  set  of  gradient  vectors  of  the  active  constraints  can  include  e^  (the 
t-th  natural  basis  vector)  or  but  not  both  (and  possibly  neither).  Thus, 

the  set  of  gradient  vectors  of  the  active  constraints  is  linearly  independent 
for  any  g,  and  any  g is  therefore  regular. 


Because  this  is  the  only  solution  to  the  necessary  conditions  (A4)-(A6), 
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ABSTRACT 

A decentralized  algorithm  for  determining  the  shortest  paths  in  a net- 
work is  presented.  Using  information  received  only  from  neighboring  nodes, 
a sequence  of  additions  and  comparisons  is  performed  at  each  node  in  the 
network.  Convergence  to  the  optimal  solution  takes  place  in  finite  time. 


I.  INTRODUCTION 

A common  problem  in  graph  theory  is  that  of  finding  the  shortest  paths 
between  all  pairs  of  nodes  in  a network,  and  numerous  algorithms  exist  for 
its  solution,  e.g.  [1],  [2],  [3].  Nearly  all  of  these  algorithms  are  de- 
veloped under  the  presumption  that  the  computations  will  be  performed  by  a 
decision  maker  with  knowledge  of  the  entire  graph  topology  and  of  all 
branch  lengths.  The  implementation  of  algorithms  of  this  type  can  be 
thought  of  as  requiring  each  node  to  transmit  distance  and  topology  infor- 
mation to  a central  controller,  who  is  then  responsible  for  solving  the 
problem.  After  the  shortest  paths  have  been  determined,  the  controller 
will  send  the  appropriate  routing  information  to  each  of  the  nodes.  In  a 
large  network  this  could  involve  a significant  amount  of  communication. 
Additionally,  for  some  networks  establishment  of  a central  controller  may 
be  expensive,  infeasible,  or  undesirable  from  a security  or  reliability 
viewpoint. 

The  purpose  of  this  paper  is  to  present  a decentralized  shortest  path 
algorithm  in  which  each  node  computes  its  shortest  distance  to  each  other 
node,  while  requiring  communication  only  with  its  adjacent  nodes,  thus 
eliminating  the  need  for  a control  center.  This  algorithm  does  not  arise 
from  any  new  concept;  it  is  based  primarily  on  a shortest  path  algorithm  of 
Ford  and  Fulkerson  [4].  In  a similar  spirit  to  the  modifications  made  by 
Lau,  Persiano  and  Varalya  [5]  to  a similar  algorithm  for  the  maximum  flow 
problem,  the  Ford  and  Fulkerson  algorithm  is  modified  and  reinterpreted  in 
order  to  extract  and  emphasize  its  localized  information  requirements. 

Very  little  topological  information  is  needed.  Each  node  needs  to  know 
only  which  nodes  are  attached  to  the  incoming  branches,  which  are  attached 
to  the  outgoing  ones,  and  the  lengths  of  the  links  to  the  outgoing  nodes. 

A node  sends  information  only  to  its  incoming  nodes.  For  each  ultimate 
destination,  a node  calculates  an  estimate  of  the  shortest  path  via  eaeh  of 
its  outgoing  links;  the  smallest  of  these  is  taken  to  be  its  estimate  of 
the  shortest  path  to  that  destination.  All  of  these  approximate  shortest 
distances  will  have  become  true  shortest  distances  by  the  time  the  algo- 
rithm converges.  Convergences  is  guaranteed  in  finite  time,  even  if  the 
algorithm  is  implemented  by  the  nodes  of  the  network  in  an  asynchronous 
manner. 

11.  THE  ALGORITHM 

, § 

Consider  a directed  graph  consisting  of  N nodes,  denoted  {1,2,...N}, 
and  a collection  of  arcs,  A = {(i,j)  : i,j  C N and  there  exists  a direct 
link  from  i to  j}.  To  each  arc  (i,j)C  A is  associated  a length  Z(i,j). 
These  lengths  could  represent  physical  distance,  time,  energy,  money,  or 


any  other  quantity  suitable  for  the  network  of  interest.  The  lengths  are 
unrestricted  in  sign,  but  the  sura  of  the  lengths  in  any  closed  loop  of  the 
network  is  assumed  to  be  positive.  Also,  for  every  iC  N define 


I(i)  = : (j.i) C A}. 

J(i)  = {j  : (i.j) e A}. 

We  will  refer  to  I(i)  as  the  set  of  incoming  nodes  to  i,  and  J(i)  as  the 
1 set  of  outgoing  nodes  from  i.  Each  node  maintains  a matrix  of  shortest 

distances  to  each  ultimate  destination  via  each  outgoing  branch.  In  Fig.  1, 
d(i,j;k)  represents  the  "current  shortest  distance"  from  i to  j , given  that 
k must  be  the  next  node  along  any  path  considered, 

d(i,j)  = min  d(i,j;k), 
k 

n(i,j)  is  the  next  node  on  the  path  that  achieves  the  distance  d(i,j).  Row 
i is  crossed  out  because  it  would  represent  - distances  from  i to  itself. 

For  any  j ^ J (i) > j ^ i»  column  j is  crossed  out  because  no  direct  . link  ' 
exists  from  i to  j . 

Initialization  of  the  matrices  requires  only  local  topological  infor- 
mation. Each  node  i begins  by  crossing  out  row  i and  all  appropriate  col- 
umns, as  discussed  previously.  The  diagonal  elements  represent  direct  link 
distances  to  other  nodes.  Thus,  every  diagonal  element  which  is  not 
crossed  out  (viz.,  those  in  the  columns  of  the  outgoing  nodes)  is  assigned 
the  length  of ' the  arc  from  i to  the  given  outgoing  .node.  Assume  .£(i,j)  = M, 
a very  large  number,  whenever  (i,j)^  A.  Then  in  column  i,  d(i,j)  = .t(i,j) 
and  nCi,j)  = j.  In  other  words,  since  direct  paths  are  the  only  ones  known 
at  this  time,  column  i,  which  consists  of  the  shortest  paths  to  each  desti- 
nation based  on  information  received  to  date,  initially  contains  the 
lengths  of  the  direct  links  to  each  node.  Note  that  whenever  d(i,j)  = M, 

I'  this  indicates  that  no  real  path  from  i to  j has  yet  been  found.  (M  should 

[ be  treated  like  ».  M + d = M for  any  "real  distance"  d. ) 

fi  Now,  for  purposes  of  analysis,  imagine  stacking  the  N X N distance 

I matrices  in  order,  one  above  the  other,  to  form  a distance  cube  with  the 

r matrix  of  node  1 at  the  top,  and  that  of  node  N at  the  bottom.  The  basic 

idea  behind  the  algorithm  is  the  following.  Suppose  that  node  i makes  a 
change  (this  includes  the  initialization  step)  in  some  d(i,j)  component  in 
column  i.  The  only  distances  that  are  directly  affected  by  this  change  are 
the  distances  to  j,  via  i,  for  each  of  the  incoming  nodes  to  i.  The  dis- 
tance cube  is  arranged  in  such  a way  that  these  affected  elements  are  those 
that  are  not  crossed  out  and  lie  along  the  vertical  line  that  passes 
through  d(i,j).  That  is,  information'  transmission  is  purely  vertical. 

Thus,  distances  to  other  nodes  are  received  from  the  set  of  outgoing  nodes, 
making  it  possible  to  calculate  distances  via  these  nodes. 

More  specifically,  at  each  node  i the  algorithm  is  begun  by  initializ- 
ing the  distance  matrix.  Each  distance  in  column  i must  then  be  transmit- 
ted up  and  down  the  corresponding  vertical  line.  Node  i now  does  nothing 
until  a new  distance,  say  to  node  k,  is  received  from  an  outgoing  neighbor 
2-  To  calculate  the  distance  to  k via  j,  the  distance  received  from  j must 
be  added  to  the  direct  distance,  ^(i,^),  from  i to  j , which  can  always  be 
found  in  the  (j,j)  element  of  matrix  i.  This  sum  is  the  new  d(i,k;j)  and 
replaces  that  stored  in  the  (k,j)  element  of  node  i's  matrix.  If  it  is 
larger  than  d(i,k),  node  i does  nothing  because  a better  path  has  already 
been  found.  If  it  equals  d(i,k),  j can  be  included  in  n(i,k)  because  there 
is  a tie  for  shortest  path.  If  it  is  less  than  d(i,k),  it  becomes  the 
"current  shortest  distance"  from  i to  k,  so  in  the  (k,i)  element  of  this 
distance  matrix,  node  i replaces  d(i,k)  by  d(i,k;j),  and  n(i,k)  by  j,  and 
transmits  this  new  distance  along  the  vertical  line  through  his  (k,i) 


element.  The  algorithm  continues  in  this  manner  until  no  more  changes  can 
be  made.  At  this  point,  each  node  will  know  the  shortest  distance  to  each 
destination  (or  that  no  path  exists,  which  will  be  reflected  as  a shortest 
path  length  M)  > next  node  in  the  path  that  achieves  this  distance,  and 

the  shortest  distance  via  each  alternative  outgoing  node. 

Observe  that  in  any  given  node's  distance  matrix  the  operations  in  any 
row  are  self-contained  and  independent  of  those  in  any  other  row:  the 
initialization  step  and  each  subsequent  addition  and  comparison  operation 
Involves  only  elements  in  a specific  row.  This  means  that  the  operations 
performed  by  a given  node  for  one  ultimate  destination  are  independent  of 
those  for  any  other  ultimate  destination.  It  has  also  been  noted  that  in 
the  distance  cube  constructed  by  vertically  stacking  the  individual  dis- 
tance matrices,  the  only  communication  takes  place  along  vertical  lines. 
This  reflects  the  fact  that  the  Information  transfer  concerning  one  parti- 
cular ultimate  destination  is  independent  of  that  for  any  other.  Together, 
these  observations  mean  that  both  information  transfer  and  addition-compar- 
isons are  independent  from  one  ultimate  destination  to  the  next.  In  the 
distance  cube,  this  means  that  the  vertical  "slices"  corresponding  to  each 
fixed  ultimate  destination  are  self-contained  in  so  far  as  both  communica- 
tion and  addition-comparison  operations  are  concerned.  This  decomposition 
property  is  the  basis  for  our  proof  of  convergence  in  the  next  section.  It 
should  be  emphasized,  however,  that  the  topological  information  required  to 
construct  and  update  each  of  these  vertical  slices  is  not  localized.  In- 
deed, interpretation  of  the  algorithm  for  these  vertical  planes  is  closely 
related  to  the  "centralized"  Ford-Fulkerson  algorithm.  What  is  important 
for  our  purposes  is  that  while  the  decentralized  nature  of  the  algorithm  is 
exhibited  by  separating  the  distance  cube  into  horizontal  slices,  conver- 
gence is  most  easily  proven  by  thinking  of  the  cube  as  being  separated  into 
vertical  slices  that  are  self-contained  and  for  which  convergence  can  be 
proven  separately  and  individually.  Clearly,  the  two  are  equivalent  since 
they  are  simply  alterative  decompositions  of  the  same  distance  cube. 

III.  CONVERGENCE  OF  THE  ALGORITHM 

The  convergence  of  the  algorithm  will  be  proved,  by  induction,  for  an 
arbitrary  vertical  slice.  Since  the  algorithm  can  be  applied  independently 
to  each  vertical  matrix,  convergence  for  a vertical  slice  implies  conver- 
gence for  the  entire  distance  cube.  Consider  the  vertical  matrix  composed 
of  all  of  the  row  j's.i.e.  corresponding  to  ndde  j being  the  common  ultimate 
destination.  This  matrix  will  take  the  form  given  in  Fig.  2.  Row  ^ is 
crossed  out  because  node  j is  not  Interested  in  distances  to  itself.  As  an 
example  of  the  fact  that  arcs  may  not  exist  between  all  pairs  of  nodes,  the 
■(1,2)  element  has  been  crossed  out,  indicating  that  in  this  case  the  graph 
contains  no  direct  link  from  node  1 to  node  2.  Each  diagonal  element  con- 
tains the  current  shortest  distance  to  destination  j and  the  next  node  in 
the  corresponding  shortest  path.  Distance' changes  in  any  diagonal  element 
will  be  communicated  throughout  the  corresponding  column. 

Suppose  j is  an  isolated  node,  so  that  no  node  has  a path  into  j.  Tlien  . 
each  diagonal  element  will  initially  have  d(l,j)  = M.  After  these  M’s  have 
all  been  transmitted,  all  distances  in  the  vertical  matrix  will  have  the 
value  M,  no  further  changes  can  be  made  and  the  algorithm  stops.  Thus,  the 
final  matrix  does  indeed  Indicate  that  no  node  can  find  a path  to  j.  Now 
suppose  that  at  least  one  node  has  a path  to  J.  Define 

S(m)  = {i  : 3 a shortest  path  from  i to  j containing  exact- 
ly m arcs) 

Since  for  some  node  at  least  one  path  exists  to  j , an  optimal  path  exists. 
By  the  Principle  of  Optimality,  the  last  link  in  this  path,  say  (k,j),  must 
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be  a shortest  path  from  k to  j.  Thus  k C S(l)  and  S(l)  is  not  empty.  The 
diagonal  element  corresponding  to  each  node  in  5(1)  will  initially  contain 
the  direct  distance  to  j . By  the  definition  of  S(l),  these  distances  are 
optimal.  At  some  time,  T(l),  each  of  these  distances  will  haye  been  trans- 
mitted throughout  its  column,  and  each  of  the  columns  corresponding  to  the 
nodes  in  S(l)  will  now  be  optimal.  In  general,  define  T(m)  to  be  the  time 
at  which  each  node  in  S(m)  has  transmitted  its  true  shortest  distance  to  j. 
Now  assume  that  at  time  T(m),  the  columns  associated  with  the  nodes  in 
S(l)  U S(2)  U U S(m)  have  been  optimized.  By  the  Principle  of  Opti- 

mality, for  each  node  in  S(iirH),  the  shortest  paths  to  j with  mil  arcs 
must  all  involve  going  first  through  a node  in  S(m).  Therefore,  at  time 
T(r) , after  each  of  the  nodes  in  S(m)  has  transmitted  its  shortest  distance, 
each  row  corresponding  to  a node  in  S(m+1)  will  have  the  shortest  distance 
to  j in  one  of  its  S(m)  columns.  Since  the  distances  in  each  'of  these  rows 
are  actual  distances,  they  must  be  greater  than  or  equal  to  shortest  dis-- 
tances  to  J.  Thus,  after  comparisons  are  performed,  the  diagonal  elements 
in  the  S(m4-1)  rows  will  contain  true  shortest  distances  to  j.  These  dis-  . 
tances  will  be  transmitted,  and  at  time  T(m+1),  the  columns  corresponding 
to  the  S(l)  U S(2)  U ...  U S(mfl)  nodes  will  all  be  optimal.  The  only 
fact  still  to  be  proven,  is  that  the  algorithm  stops.  Let  m'  be  the  smal- 
lest integer  such  that  S(l)  U S(2)  U ...  U S(m')  contains  all  of  the 
nodes  in  the  graph.  The  existence  of  such  an  integer  is  guaranteed  by  the 
fact  that  m'  cannot  exceed  N-1,  which  is  the  maximum  number  of  links  pos- 
sible in  any  shortest  path.  Obviously,  at  time  T(m'),  every  column  in  the 
matrix  will  consist  entirely  of  optimal  distances.  At'  this  point,  the  al- 
gorithm must  stop. 

In  order  to  calculate  an  upper  bound  on  the  number  of  operations  re- 
quired, the  algorithm  is  modified  to  operate  on  a synchronous  basis.  The 
event-driven  nature  of  the  original  algorithm,  i.e.  sending  new  distances  as 
soon  as  they  are  calculated,  is  quite  convenient  for  the  nodes  using  the 
algorithm.  However,  this  characteristic  makes  it  difficult,  if  not  impos- 
sible, to  bound  the  number  of  operations.  So  we  assume  instead  that  nodes 
are  forced  to  take  turns,  being  allowed  to  transmit  once  every  N units  of 
time.  Define  a triple  operation'  to  be  the  sequence  of  performing  an  addi- 
tion, a comparison,  and  a replacement  if  necessary.  Again,  consider  an 
arbitrary  vertical  slice  of  the  cube.  During  the  first  cycle  of  transmis- 
sions, the  distances  in  each  of  the  N-1  diagonal  elements  will  be  propagat- 
ed, and  each  will  cause  at  most  N-2  triple  operations.  At  the  conclusion 
of  this  cycle,  at  least  one  column  is  optimal,  and  does  not  enter  into 
future  calculations.  During  the  second  cycle,  at  most  N-2  diagonal  elements 
will  transmit  changes , so  a maximum  of  (N-2) (N-2)  triple  operations  are  per- 
formed during  this  cycle.  The  upper  bound  for  total  operations  for  one 
vertical  matrix  is  given  by 

(N-2)(N-1)^+  (N-2)  (N-2)  + ...  + (N-2)  = | N(N-l)(N-2) 

Therefore,  the  entire  distance  cube  will  be  optimal  after  at  most 
i N^(N-1) (N-2)  triple  operations.  This  is  an  order  of  magnitude  larger  than 
the  algorithm  in  Hu[l],  but  there  are  other  considerations.  The  calcula- 
tions are  shared  by  all  of  the  nodes  of  the  network,  and  each  node  does 
approximately  the  same  amount  of  work  as  a central  controller  would.  The 
principal  advantage  of  the  algorithm  is  that  each  node  computes  its  own 
Shortest  distance  matrix. 

IV.  MODIFICATIONS  AND  EXTENSIONS 

Practical  implementation  of  the  algorithm  will  obviously  differ  some- 
what from  the  form  given  above,  which  han  been  chosen  with  ease  of  presenta- 
tion in  mind.  Construction  and  maintenance  of  the  entire  N X N distance 
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matflx  by  each  node  would  be  Inefficient.  Storage  space  should  be  allocated 
only  for  the  columns  corresponding  to  the  set  of  outgoing  nodes,  as  well  as 
the  column  containing  the  "current  shortest  distances"  and  "next  nodes". 

The  distance  cube  cannot  be  constructed  by  any  single  node  because  that 
would  require  knowledge  of  the  full  topology  of  the  network,  so  information 
transfer  must  be  handled  differently.  In  the  distance  cube  model  distances 
are  propagated  vertically,  but  these  distances  are  ignored  whenever  the 
vertical  line  passes  through  a crossed-out  element.  Those  elements  along 
the  line  which  are  not  crossed  out  are  precisely  the  set  of  incoming  nodes. 
Tlius,  when' a node  i has  a new  distance  to  be  transmitted,  he  sends  it  only 
to  the  nodes  j C I(i)*  The  message  sent  must  include  the  identity  of  the 
sender,  the  name  of  the  destination  to  whom  the  distance  has  changed,  and 
the  new  distance  to  that  destination.  The  addition-comparision  operation 
remains  unchanged. 

An  additional  topic  of  interest  is  topological  changes  in  the  network. 
A problem  arises  when  a topological  change  causes  one  or  more  shortest 
paths  to  get  longer.  This  can  occur  when  an  arc  length  increases' or  "when 
there  is  a breakdown  in  a node  or  link.  It  is  crucial  in  the  convergence 
proof  that  when  a d(i,j;k)  assumes  the  value  of  the  tirue  shortest  path  from 
i to  j,  lit  must  be  the  smallest  distance  in  row  j and  d(i,j)  will  be  as- 
signed this  optimal  value.  It  is  the  propagation  of  these  true  shortest 
distances  which  guarantees  the  convergence  of  the  algorithm.  Fix  an  ulti- 
mate destination  j and  consider  the  corresponding  vertical  matrix.  Suppose 
a topological  change  occurs,  and  some  shortest  distances  increase.  For  the 
new  topology,  we  can  construct  the  sets  S'(m),  which  may  differ  from  the 
sets  S(m).  However,  the  d(i,j;k)'s  in  some  rows  may  no  longer  be  valid; 
they  could  be  smaller  than  real  distances  to  j if  they  correspond  to  paths 
affected  by  the  topological  change.  Thus,  in  the  convergence  proof,  there 
is  no  guarantee  that  when  a true  shortest  distance  enters  a row,  it  will  be 
smaller  than  the  other  distances  in  the  row.  It  is  possible  that  the  al- 
gorithm will  converge  in  some  such  cases,  but  a result  along  these  lines 
has  not  yet  been  proved. 

On  the  other  hand,  if  a topological  change  decreases  some  shortest 
paths,  while  none  increase,  convergence  can  be  proven.  Suppose  the  change 
occurs,  and  consider  vertical  matrix  j.  Define  the  sets  S’ (m)  for  the  new 
topology.  Since  the  nodes  directly  affected  by  a topological  change  will 
know  about  this  change  immediately,  the  distances  in  column  j,  i.e.  the 
lengths  of  the  direct  links  into  j,  will  be  correct  immediately  after  the 
topological  change.  In  particular,  these  direct  distances  will  be  in  the 
rows  of  the  S'(l)  nodes.  The  other  distances  in  each  of  these  rows  will 
either  be  actual  distances,  or  will  be  larger  than  actual  distances.  There- 
fore, the  distance  in  column  j will  be  the  minimum  in  each  S'  (1)  row,  and 
will  be  assigned  to  the  diagonal  element  of  the  row.  Then  at  time  T'(l), 
the  S'(l)  columns  will  be  optimal.  The  same  argument  is  valid  in  the  in- 
ductive step  of  the  convergence  proof. 

V.  CONCLUSION 

A decentralized  algorithm  for  finding  shortest  paths  between  all  pairs 
of  nodes  in  a graph  has  been  presented.  Each  node  requires  only  local 
topological  Information.  All  communication  in  the  algorithm  is  between  ad- 
jacent nodes.  The  algorithm  can  be  implemented  asynchronously,  an  advan- 
age  in  networks  where  the  individual  nodes  have  different  processing  capa- 
bilities. Tlie  algorithm  may  require  more  operations  than  a centralized 
scheme,  but  the  calculations  are  divided  up  among  all  of  the  nodes  in  the 
graph.  There  is  also  the  unanswered  question  of  convergence  of  the  algo- 
rithm after  certain  topological  changes.'  However,  roost  importantly,  the 
algorithm  finds  all  optimal  paths  in  the  network,  while  giving  the  users 
the  advantages  of  decentralized  topological  and  information  requirements. 
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QUANTIZATION  LOSS  IN  OPTICAL  CaMMUNICATION  SYSTEMS* 
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St.  Louis,  Missouri  63130 

ABSTRACT 

In  a digital  consnunlcatlon  system  employing  an  optical  carrier 
and  direct  detection  and  having  five  to  forty  dark-current  counts  per 
channel  use,  about  l.Sdb  more  signal  energy  Is  required  with  hard 
decisions  to  achieve  the  same  cutoff  rate  as  with  Infinitely  soft 
decisions. 
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Reprint  of  Abstract: 


"Some  Implications  of  the  Cutoff  Rate  Criterion  for  Coded,  Direct-Detec- 
tion, Optical  Communication  Systems,"  Donald  L.  Snyder  and  Ian  B. 
Rhodes,  Abstracts  of  Papers,  1979  IEEE  International  Information  Theory 
Symposium,  Grignano,  Italy,  June  25-29,  1979,  Session  F2. 
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SESSION  F2 


SOME  IMPLICATIONS  OF  THE  CUTOFF  RATE  CRITERION  FOR  CODED, 
DIRECT -DETECTION,  OPTICAL  COMMUNICATION  SYSTEMS,  Donald 
L.  Snyder  and  Ian  B.  Rhodes  (Washington  University,  St.  Louis, 

Missouri  63130).  The  cutoff  rate  is  derived  for  a digital  communication 
system  employing  an  optical  carrier  and  direct  detection.  The 
coordinated  design  of  the  optical  modulator  and  demodulator  is  then 
studied  using  the  cutoff  rate  as  a performance  measure  rather  than 
the  more  commonly  employed  error  probability.  The  best  choice  of 
optical  modulation  is  identified  for  various  relationships  between  peak 
amplitude  and  average  energy  constraints  on  the  transmitted  optical 
field.  WTien  the  average  energy  constraint  is  predominant,  pulse 
position  modulation  is  shown  to  maximize  the  cutoff  rate.  When  the 
peak  amplitude  constraint  is  predominant,  Hadamard  matrices  can  be 
used  to  define  an  optimum  choice  of  modulation.  Problems  of  efficient 
energy  utilization,  choice  of  input  and  output  alphabet  size,  and  the 
effect  of  random  detector  gain  and  addressed. 
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